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SYNTHETIC PLANT GENES AND METHOD FOR PREPARATION 



BACKGROUND OF THE INVENTION 



The present invention relates to genetic engineering and more particularly to plant transformation in 

5 which a plant is transfomaed to express a heterologous gene. 

Although great progress has been made in recent years with respect to transgenic plants which express 
foreign proteins such as herbicide resistant enzymes and viral coat proteins, very little Is known about the 
major factors affecting expression of foreign genes in plants. Several potential factors could be responsible 
in varying degrees for the level of protein expression from a particular coding sequence. The level of a 

70 particular mRNA in the cell is certainly a critical factor. 

The potential causes of low steady state levels of mRNA due to the nature of the coding sequence are 
many. Rrst. full length RNA synthesis might not occur at a high frequency. This could, for example, be 
caused tjy the premature temnination of RNA during transcription or due to unexpected mRNA processing 
during transcription. Second, full length RNA could be produced but ttien processed (splicing. polyA 

75 addition) in the nucleus in a fashion that creates a nonfunctional mRNA. If the RNA is properiy synthesized, 
terminated and polyadenylated. It then can move to the cytoplasm for translation. In the cytoplasm, mRNAs 
have distinct half lives that are determined by their sequences and by the cell type in which they are 
expressed. Some RNAs are very short-lived and some are much more long-lived. In addtion, there is an 
effect, whose magnitude is uncertain, of translational efficiency on mRNA half-life. In addition, every RNA 

20 molecule folds into a particular structure, or perhaps family of sturctures. which Is determined by its 
sequence. The particular structure of any RNA might lead to greater or lesser stability In the cytoplasm. 
Structure per se is probably also a determinant of mRNA processing in the nucleus. Unfortunately, it is 
impossible to predict, and nearly impossible to determine, the structure of any RNA (except for tRNA) in 
vitro or in vivo. However, it is likely that dramatically changing the sequence of an RNA will have a large 

25 effect on its folded structure. It is likely that structure per se or particular structural features also have a role 
in determining RNA stability. 

Some particular sequences and signals have been identified in RNAs that have the potential for having 
a specific effect on RNA stability. This section summarizes what is known about these sequences and 
signals. These identified sequences often are A+T rich, and thus are more likely to occur in an A+T rich 

30 coding sequence such as a B,t gene. The sequence motif ATTTA (or AUUUA as it appears in RNA) has 
been implicated as a destabilizing sequence in mammalian cell mRNA (Shaw and Kamen. 1986). No 
analysis of the function of this sequence in plants has been done. Many short lived mRNAs have A+T rich 
3' untranslated regions, and these regions often have the ATTTA sequence, sometimes present in mutiple 
copies or as multimers (e.g., ATTTATTTA..). Shaw and Kamen showed that the transfer of the 3 end of an 

35 unstable mRNA to a stable RNA (globin or VA1) decreased the stable RNA's half life dramatically. They 
further showed that a pentamer of ATTTA had a profound destabilizing effect on a stable message, and that 
this signal could exert its effect whether it was located at the s' end or within the coding sequence. 
However, the number of ATTTA sequences and/or the sequence context in which they occur also appear to 
be Important in determining whether they function as destabilizing sequences. Shaw and Kamen showed 

40 that a trimer of ATTTA had much less effect than a pentamer on mRNA stability and a dimer or a monomer 
had no effect on stability (Shaw and Kamen. 1987). Note that multimers of ATTTA such as a pentamer 
automatically create an A+T rich region. This was shown to be a cytoplasmic effect, not nuclear. In other 
unstable mRNAs. the ATTTA sequence may be present in only a single copy, but it is often contained in an 
A+T rich region. From the animal cell data collected to date, it appears that ATTTA at least in some 

45 contexts is important in stability, but it is not yet possible to predrct which occurences of ATTTA are 
destabiling elements or whether any of these effects are likely to be seen in plants. 

Some studies on mRNA degradation in animal cells also indicate that RNA degradation may begin in 
some cases with nucleolytic attack in A + T rich regions. It is not clear if these cleavages occur at ATTTA 
sequences. There are also examples of mRNAs that have differential stability depending on the cell type in 

so which they are expressed or on the stage within the cell cyde at which they are expressed. For example, 
histone mRNAs are stable during DNA synthesis but unstable if DNA synthesis is dismpted. The 3 end of 
some histone mRNAs seems to be responsible for this effect (Pandey and Marzluff. 1987). It does not 
appear to be mediated by ATTTA, nor is it clear what controls the differentiaJ stabifity of this mRNA. 
Another example is the differential stability of IgG mRNA in B lymphocytes during B cell maturation 
(Genovese and Milcarek, 1988). A final example is the instability of a mutant beta-thai lesemic globin mRNA. 
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In bone marrow cells, where this gene is nornnally expressed, the nnutant mRNA is unstable, while the wild- 
type mRNA is stable. When the mutant gene is expressed in HeLa or L cells in vitro, the mutant mRNA 
shows no instability (Lim et a).. 1988). These examples all provide evidence that mRNA stability can be 
mediated by cell type or cell cycle specrfic factors. Furthermore this type of instability Is not yet associated 

5 with specific sequences. Given these uncertainties, it is not possible to predict which RNAs are likely to be 
unstable in a given cell. In addition, even the ATTTA motif may act differentially depending on the nature of 
the cell in which the RNA Is present. Shaw and Kamen (1987) have reported that activation of protein kinase 
C can block degradation nnediated by ATTTA. 

The addition of a polyadenylate string to the 3 end is common to most eucaryotic mRNAs, both plant 

70 and animal. The cun-ently accepted view of polyA addition is that the nascent transcript extends beyond the 
mature 3' terminus. Contained within this transcript are signals for polyadenylation and proper 3 end 
formation. This processing at the 3' end involves cleavage of the mRNA and addition of polyA to the mature 
3' end. By searching for consensus sequences near the polyA tract in both plant and animal mRNAs, it has 
been possible to identify consensus sequences tinat apparently are involved in polyA addition and 3 end 

75 cleavage. The same consensus sequences seem to be important to both of these processes. These signals 
are typically a variation on the sequence AATAAA. In animal cells, some variants of ttiis sequence that are 
functional have been identified; in plant cells tiiere seems to be an extended range of functional sequences 
(Wickens and Stephenson. 1984; Dean et al., 1986). Because all of these consensus sequences are 
variations on AATAAA, they all are A + T rich sequences. This sequence is typically found 15 to 20 bp 

20 before tiie polyA tract in a mature mRNA. Experiments in animal cells indicate that tiiis sequence is 
involved in both polyA addition and 3' maturation. Site directed mutations in this sequence can disrupt 
these functions {Conway and Wickens, 1988; Wickens et al.. 1987). However, it has also been observed tiiat 
sequences up to 50 to 100 bp 3' to the putative polyA signal are also required; i.e.. a gene that has a 
normal AATAAA but has been replaced or disrupted downstream does not get properly polyadenylated (Gil 

25 ard Proudfoot, 1984; Sadofsky and Alwine. 1984; McDevitt et al.. 1984). That is, tiie polyA signal itself is 
not sufficient for complete and proper processing. It is not yet known what specific downstream sequences 
are required in addition to the polyA signal, or if there is a specific sequence tiiat has this function. 
Therefore, sequence analysis can only identify potential polyA signals. 

In naturally occuring mRNAs that are normally polyadenylated. it has been observed that disruption of 

30 this process, either by altering tiie polyA signal or otiier sequences in the mRNA. profound effects can be 
obtained in the level of functional mRNA. This has been observed in several naturally occuring mRNAs. with 
results that are gene specific so far. There are no general rules that can be derived yet from tiie study of 
mutants of these natural genes, and no rules that can be applied to heterologous genes. Below are four 
examples: 

35 1 . In a globin gene, absence of a proper polyA site leads to improper termination of transcription. It is 

likely, but not proven, that the improperly terminated RNA is nonfunctional and unstable (Proudfoot et al., 
1987). 

2. In a globin gene, absence of a functional polyA signal can lead to a 100-fold decrease in the level 
of mRNA accumulation (Proudfoot et al., 1 987). 

40 3. A globin gene polyA site was placed into the 3' ends of two different histone genes. The histone 

genes contain a secondary stmcture (stem-loop) near tiieir 3' ends. The amount of properly polyadenylated 
histone mRNA produced from these chimeras decreased as tiie distance between tiie stem-loop and the 
polyA site increased. Also, the two histone genes produced greatly different levels of properly polyadenylat- 
ed mRNA. Tills suggests an interaction tjetween the polyA site and other sequences on the mRNA tiiat can 

45 modulate mRNA accumulation (Pandy and fWarzluff, 1987). 

4. The soybean leghemoglobin gene has been cloned into HeLa cells, and it has been determined 
that this plant gene contains a "cryptic" polyadenylation signal that is active in animal cells, but is not 
utilized in plant cells. This leads to the production of a new polyadenylated mRNA that is nonfunctional. 
This again shows that analysis of a gene in one cell type cannot predict its t>ehavior in alternative cell types 

so (Wiebauer et al.. 1988). 

From these examples, it is clear tiiat in natural mRNAs proper polyadenylation Is Important in mRNA 
accumulation, and ttiat disruption of tiiis process can effect mRf^lA levels significantly. However, insufficient 
knowledge exists to predict the effect of changes in a nomnal gene. In a heterologous gene, where we do 
not know if the putative polyA sites (consensus sequences) are functional, it is even harder to predict the 

55 consequences. However, it is possible ttiat ttie putative sites identified are disfunctional. That is. these sites 
may not act as proper polyA sites, but instead function as aberrant sites that give rise to unstable mRNAs. 

In animal cell systems. AATAAA is by far the most common signal identified in mRNAs upstream of the 
polyA, but al least four variants have also been found (Wickens and Stephenson, 1984). In plants, not nearly 
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so much analysis has been done, but it is clear that multipte sequences similar to AATAAA can be used. 
The plant sites below called major or minor refer only to the study of Dean et al. (1986) which analyzed 
only three types of plant gene. The designation of polyadenylation sites as major or minor refers only to the 
frequency of their occurrence as functional sites In naturally occurring genes that have been analyzed. In 
5 the case of plants this is a very limited database. It is hard to predict with any certainty that a site 
designated major or minor is more or less likely to function partially or completely when found in a 
heterologous gene such as B,t 



PA 


AATAAA 


Major consensus site 


PI A 


A AT A AT 


IVlCL|Wl ^ICUIl OtlQ 


P2A 


AACCAA 


Minor plant site 


P3A 


ATATAA 


Minor plant site 


P4A 


AATCAA 


Minor plant site 


PSA 


ATACTA 


Minor plant site 


P6A 


ATAAAA 


Minor plant site 


P7A 


ATGAAA 


Minor plant site 


P8A 


AAGCAT 


Minor plant site 


P9A 


ATTAAT 


Minor plant site 


P10A 


ATACAT 


Minor plant site 


P11A 


AAAATA 


Minor plant site 


P12A 


ATTAAA 


Minor animal site 


P13A 


AATTAA 


Minor animal site 


P14A 


AATACA 


Minor animal site 


P15A 


CATAAA 


Minor animal site 



Another type of RNA processing that occurs in the nucleus is intron splicing. Nearly all of the work on 
intron processing has been done in animal cells, but some data is emerging from plants. Intron processing 

30 depends on proper S and 3' splice junction sequences. Consensus sequences for these junctions have 
been derived for both animal and plant mRNAs. but only a few nucleotides are known to be invariant 
Therefore, it is hard to predict with any certainty whether a putative splice junction is functional or partially; 
functional based solely on sequence analysis. In particular, the only invariant nucleotides are GT at the 5 
end of the intron and AG at the 3' end of the intron. In plants, at every nearby position, either within the 

35 intron or In the exon flanking the intron, all four nucleotides can be found, although some positions show 
some nucleotide preference (Brown. 1986; Hanley and Schuler. 1988). 

A plant intn^n has t:>een moved from a patatin gene into a GUS gene. To do this, site directed 
mutagenesis was perfonnned to introduce new restriction sites, and this mutagenesis changed several 
nucleotides in the intron and exon sequences flanking the GT and AG. This intron still functioned properly, 

^0 indicating the importance of the GT and AG and the flexibility at otiier nucleotide positons. There are of 
course many occurences of GT and AG in all genes that do not function as intron splice iunctions. so there 
must be some other sequence or structrual features tiiat identify splice junctions. In plants, one such feature 
appears to be base composition per se. Wiebauer et al. (1988) and Goodall et al. (1988) have analyzed 
plant introns and exons and found that exons have -50% A+T while introns have -70% A + T. Goodall et 
al. (1988) also created an artificial plant intron that has consensus 5 and 3 splice junctions and a random 
A + T rich internal sequence. This intron was spliced correctly In plants. When the internal segment was 
replaced by a G + C rich sequence, splicing efficiency was drastically reduced. These two examples 
demonsatrate that intiron recognition in plants may depend on very general features - splice junctions that 
have a great deal of sequence diversity and A + T richness of the inti^n itself. This, of course, makes it 

50 difficult to predict from sequence alone whether any particular sequence is likely to function as an active or 
partially active intron for RNA processing. 

at genes being A+T rfch contain numerous stretches of various lengtiis that have 70% or greater 
A+T. The number of such stretches identified by sequence analysis depends on the lengtii of sequence 
scanned, 

55 As for polyadenylation described above, tiiere are complications in predicting what sequences might be 
utilized as splice sites in any given gene. Rrst many naturally occuring genes have alternative splicing 
pathways that create alternative combinations of exons in the final mRNA (Gallega and Nadal-Ginard. 1988; 
Helfman and Ricci, 1988; Tsurushita and Kom. 1989). That is, some splice junctions are apparently 
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recognized under some circumstances or in certain cei! types, but not In others. The rules governing this 
are not understood. In addition, there can be an interaction between processing paths such that utilization of 
a particular polyadenylation site can interfere with splicing at a nearby splice site and vice versa (Adam! and 
Nevins, 1988; Brady and Wold. 1988; Marzluff and Pandey. 1988). Again no predictive rules are available. 

5 Also, sequence changes in a gene can drastically alter the utilization of particular splice junctions. For 
example, in a bovine growth honmone gene, smalt deletions In an exon a lew hundred bases downstream of 
an intron cause the splicing efficiency of the intron to drop from greater than 95% to less than 2% 
(essentially nonfunctional). Other deletions however have essentially no effect (Hampson and Rottman, 
1988). Rnally, a variety of in vitro and in vivo experiments indicate that mutations that disrupt normal 

70 splicing lead to rapid degradation of the RNA in the nucleus. Splicing is a multistep process in the nucleus 
and mutations in normal splicing can lead to blockades in the process at a variety of steps. Any of these 
blockades can then lead to an abnonnal and unstable RNA. Studies of mutants of normally processed 
(polyadenylation and splicing) genes are relevant to the study of heterologous genes such as at B.t genes 
might contain functional signals that lead to the production of aben^ant nonfunctional mRNAs, and these 

T5 mRNAs are likely to be unstable. But the S.t genes are perhaps even more fikely to contain signals that are 
analogous to mutant signals in a natural gene. As shown above these mutant signals are very likely to 
cause defects in the processing pathways whose consequence is to produce unstable mRNAs. 

It is not known with any certainty what signals RNA transcription termination in plant or animal cells- 
Some studies on animal genes that indicate that stretches of sequence rich in T cause termination by calf 

20 thymus RNA polymerase II in vitro. These studies have shown that the 3' ends of in vitro tenminated 
transcripts often lie within runs of T such as T5, T6 or T7. Other identified sites have not been composed 
solely of T, but have had one or more other nucleotides as well. Temnination has been found to occur within 
the sequences TATTTTTT. ATTCTC, TTCTT (Dedrick et al., 1987; Reines et al., 1987). In the case of these 
latter two. the context in which the sequence Is found has been C + T rich as well. It is not known if this Is 

25 essential. Other studies have implicated stretches of A as potential transcriptional tenDinators. An interesting 
example from SV40 illustrates the uncertainty in defining tenmlnators based on sequence alone. One 
potential terminator in SV40 was identified as being A rich and having a region of dyad symmetry (potential 
stem-loop) S to the A rich stretch. However, a second terminator identified experimentally downstream In 
the same gene was not A rrch and included no potential secondary structure (Kessler et al.. 1988). Of 

30 course, due to the A+T content of B.t genes, they are rich in runs of A or T that could act as tenninators. 
The importance of termination to stability of the mRNA is shown by the globin gene example described 
above. Absence of a normal polyA site leads to a failure in proper termination with a consequent decrease 
in mRNA. 

There is also an effect on mRNA stability due the translation of the mRNA. Premature translational 
35 termination in human triose phosphate isomerase leads to instability of the mRNA (Daar et al.. 1988). 
Another example is the beta-thallesemic globin mRNA described above that Is specifically unstable In bone 
man-ow cells (Lim et al.. 1988). The defect in this mutant gene is a single base pair deletion at codon 44 
that leads to translational termination (a nonsense codon) at codon 60. Compared to properly translated 
normal globin mRNA. this mutant RNA is very unstable. These results indicate that an improperty translated 
40 mRNA is unstable. Other work in yeast indicates that proper but poor translation can have an effect on 
mRNA levels. A heterologous gene was modified to convert certain codons to more yeast pretended codons. 
An overall 10-fold increase in protein production was achieved, but there was also about a a-fold increase in 
mRNA Hoekema et al., 1987). This indicates that more efficient translation can lead to greater mRNA 
stability, and tiiat the effect of codon usage can be at the RNA level as well as the tianslational level. It is 
45 not clear from codon usage studies which codons lead to poor translation, or how this is coupled to mRNA 
stability. 

Therefore, it is an object of the present invention to provide a method for preparing synthetic plant 
genes which express their respective proteins at relatively high levels when compared to wild-type genes. It 
is yet anotiier object of the present invention to provide synthetic plant genes which express the crystal 
50 protein toxin of Bacillus thurmgiensis at relatively high levels. 



BRIEF DESCRIPTION OF THE DRAWINGS 

55 

Figure 1 illustrates the steps employed in modifying a wild-type gene to increase expression 
efficiency in plants. 

Rgure 2 illust-ates a comparison of the changes in tii© modified BJ.k. HD-1 sequence of Example 1 
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(lower line) versus the wild-type sequence oi B,tk, H01 which encodes the crystal protein toxin (upper 
line). 

Rgure 3 illustrates a comparison of the changes in the synthetic BXK. HD-1 sequence of Example 2 
(lower line) versus the wild-type sequence of B.tX HD-1 which errcodes the crystal protein toxin (upper 
5 line). 

Figure 4 illustrates a comparison of the changes in the synthetic BAX HD-73 sequence of Example 3 
Oower line) versus the wild-type sequence of BJX HE>73 (upper line). 

Rgure 5 represents a plasmid map of intermediate plant transformation vector cassette pMON893. 

Rgure 6 represents a plasmid map of intermediate plant transformation vector cassette pMON900. 
70 Figure 7 represents a map for the disarmed T-DNA of A, tumefaciens AGO. 

Rgure 8 illustrates a comparison of the changes In the synthetic truncated BAX. HD-73 gene (Amino 
acids 29-615 with an N-terminal Met- Ala) of Example 3 (lower line) versus the wild-type sequence erf B.tX 
HD-73 (upper line). 

Rgure 9 illustrates a comparison of the changes in the synthetic/wild-type full length BJX. HD-73 
15 sequence of Example 3 (lower line) versus the wild-type full-length sequence of BJX HD-73 (upper line). 

Rgure 10 illustrates a comparison of the changes in the synthetic/modified full length BXk. HD-73 
sequence of Example 3 (lower line) versus the wild-type full-length sequence of BJX HD-73 (upper line). 

Rgure 11 illustrates a comparison of ttie changes In the fully synthetic full-length BJX. HD-73 
sequence of Example 3 Oower line) versus the wild-type full-length sequence of BJX. HD-73 (upper line). 
20 Rgure 12 illustrates a comparison of the changes in the synthetic BJJ, sequence of Example 5 

(lower line) versus the wild-type sequence of BXt, which encodes the crystal protein toxin (upper line). 

Rgure 13 illustrates a comparison of the changes in the synthetic B.t P2 sequence of Example 6 
(lower line) versus the wild-type sequence of BJX. HD-1 which encodes the P2 protein toxin (upper line). 
Rgure 14 illustrates a comparison of the changes in the synthetic BJ. entomocidus sequence of 
25 Example 7 (lower line) versus the wild-type sequence of BJ. entomocidus which encodes the Btent protein 
toxin (upper line). 

Rgure 1 5 illustrates a plasmid map for plant expression cassette vector pIV10N744. 
Rgure 16 illustrates a comparison of the changes in the synthetic potato leaf roll virus (PLRV) coat 
protein sequence of Example 9 (lower line) versus the wild-type coat protein sequence of PLRV (upper 
00 line). 

STATEMENT OF THE INVENTION 



35 The present invention provides a method for preparing synthetic plant genes which genes express their 
protein product at levels significantly higher than the wild-type genes which were commonly employed in 
plant transformation heretofore. In another aspect, the present invention also provides novel synthetic plant 
genes which encode non-plant proteir\s. 

For brevity and clarity of description, the present invention will be primarily described with respect to 

40 the preparation of synthetic plant genes which encode the crystal protein toxin of Bacillus thuringiensis - 
{BJ,y Suitable B.f. subspecies include, but are not limited to. BJ. kurstaki HD-1, e.f. kurstaki HD-73, Bj. 
sotto, BJ. berlinen B.t thuringiensis, BX tolworthi, BJ. dendrolimus, Bj. alesti, BX galteriae, B.t 
aizawai, BJ, subtoxicus, BJ. entomocidus, BJ. tenebrionis and BJ. san diego. However, those skilled in 
the art will recognize and it should be understood that the present method may be used to prepare 

45 synthetic plant genes which encode non-plant proteins other than the crystal protein toxin of BX as well as 
plant proteins (see for instance. Example 9). 

The expres^on of 8 J, genes in plants is problematic. Although the expression of BJ. genes in plants at 
insecticidal levels has been reported, this accomplishment has not been straightfonward. In particular, the 
expression of a full-length lepidopteran specific fl.f. gene (comprising DNA from a BJX. isolate) has been 

50 reported to be unsuccessful in yielding insecticidal levels of expression in some plant species (Vaeck et al., 
1987 and Barton et al., 1987). 

It has been reported that expression of the full-length gene from B.tX. HD-t was detectable in tomato 
plants but that truncated genes led to a higher frequency of insecticidal plants with an overall higher level of 
expression. Truncated genes of B.t bertineralso led to a higher frequency of insecticidal plants in tobacco 

55 (Vaeck et al.. 1987). On the other hand, insecticidal plants were provided from lettuce transformants using a 
full-length gene. 

It has also been reported that the full length gene from BJ.k. HD-73 gave some insecticidal effect in 
tobacco (Adang et al.. 1987). However, the BX mRNA detected in these plants was only 1.7 kb compared 
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to the expected 3.7 kb indicating improper expression of the gene. It was suggested that this truncated 
mRNA was too short to encode a functional truncated toxin, but there must have been a low level of longer 
mRNA in some plants or no insecticidai activity would have been observed. Others have reported in a 
publication that they observed a large amount of shorter than expected mRNA from a truncated B,tk, gene. 
5 but some mRNA of the expected size was also observed. In fact, it was suggested that expression of the 
full length gene is toxic to tobacco callus (Barton et al.. 1987). The above illustrates that lepidopteran type 
BA. genes are poorly expressed in plants compared to other chimeric genes previously expressed from the 
same promoter cassettes. 

The expression of B.tt in tomato and potato is at levels similar to that of BJM, (i.e.. poor). B.t,t and 
w Bj,k, genes share only limited sequence homology, but tfney share many common features in terms of 
base composition and the presence of particular A+T rich elements. 

Ail reports in the field have noted the lower than expected expression of B.t genes in plants. In general, 
insecticidai efficacy has been measured using insects very sensitive to e.f. toxin such as tobacco 
hornworm. Although it has been possible to obtain plants totally protected against tobacco hornwonm. it is 
15 important to note that hornworm is up to 500 fold more sensitive to BX toxin than some agronomicaily 
important insect pests such as beet armyworm. It is therefore of interest to obtain transgenic plants that are 
protected against all important lepidopteran pests (or against Colorado potato beetle in the case of B.t 
tenebrionis), and in addition to have a level of B.t. expression that provides an additional safety margin 
over and above the efficacious protection level. It is also important to devise plant genes which function 
20 reproducibly from species to species, so that insect resistant plants can be obtained in a predictable 
fashion. 

In order to achieve these goals, it is important to understand the nature of the poorer than expected 
expression of B,t genes in plants. The level of stable B,t. mRNA in plants is much lower than expected. 
That is, compared to other coding sequences driven by the same promoter, the level of B.t. mRNA 

25 measured by Northern analysis or nuclease protection experiments is much lower. For example, tomato 
plant 337 (Fischhoff et al., 1987) was selected as the best expressing plant with pMON9711 which contains 
the B,tX HD-1 Kpnl fragment driven by the CaMV 35S promoter and contains the NOS-NPTIl-NOS 
selectable marker gene. In this plant the level of B.t mRNA is between 100 to 1000 fold lower than the level 
of NPTII mRNA, even though the 35S promoter is approximately 50-fold stronger than the NOS promoter 

30 (Sanders et al., 1987). 

The level of BX toxin protein det^ted in plants is consistent with the low level of at mRNA. Moreover, 
the insecticidai efficacy of the transgenic plants correlates with the B,t protein level indicating that the toxin 
protein produced in plants is biologically active. Therefore, the low level of af. toxin expression may be the 
result of the low levels of BX mRNA. 

35 Messenger RNA levels are determined by the rate of synthesis and rate of degradation. It Is the 
balance between these two that determines the steady state level of mRNA. The rate of synthesis has been 
maximized by the use of the CaMV 35S promoter, a strong constitutive plant expressible promoter. The use 
of other plant promoters such as nopaline synthase (NOS). mannopine synthase (MAS) and ribulose 
bi^hosphatecarboxylase small subunit (RUBISCO) have not led to dramatic changes in the levels' of BX 

40 toxin protein expression indicating that the effects determining B.t toxin protein levels are pnDmoter 
independent. These data imply that the coding sequences of DNA genes encoding B.t toxin proteins are 
somehow responsible for the poor expression level, and that this effect is manifested by a tow level of 
accumulated stable mRNA. 

Lower than expected levels of mRNA have been observed with four different lepidopteran specific 

45 genes (two from B.t^. HD-1; B.t berlmer and BXk. HD-73) as well as the gene from the coleopteran 
specific B.t tenebrionis. It appears that for lepidopteran type B.t genes these effects are manifest more 
strongly in the full length coding sequences than in the truncated coding sequences. These effects are seen 
across plant species although their magnitude seems greater in some plant species such as tobacco. 

The nature of the coding sequences of B.t genes distinguishes them from plant genes as well as many 

50 other heterologous genes expressed in plants. In particular, B.t genes are very rich (-62%) in adenine (A) 
and thymine (T) while plant genes and most bacterial genes which have been expressed in plants are on 
the order of 45-55% A+T. The A+T content of the genomes (and thus the genes) of any organism are 
features of that organian and reflect its evolutionary history. White within any one organism genes have 
similar A + T content, the A+T content can vary tren^endously fnDm organism to organism. For example. 

55 some Bacillus species have among the most A+T rich genomes while some Steptomyces species are 
among the least A + T rich genomes (-30 to 35% A + T). 

Due to the degeneracy of the genetic code and the limited number of codon choices for any amino 
add, most of the "excess" A+T of the structural coding sequences of some Bscillus species are found In 
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the third position of the codons. That is. genes of some Bacillus species have A or T as the third nucleotide 
in many codons. Thus A+T content in part can determine codon usage bias. In addition, it is clear that 
genes evolve for maximum function in the organism in which they evolve. This means that particular 
nucleotide sequences found in a gene from one organism, where they may play no role except to code for 

5 a particular stretch of amino acids, have the potential to be recognized as gene control elements In another 
organism (such as transcriptional promoters or terminators, polyA addition sites, intron splice sites, or 
specific mRNA degradation signals). It is perhaps surprising that such misread signals are not a more 
common feature of heterologous gene expression, but this can be explained in part by the relatively 
homogeneous A+T content (-50%) of many organisms. This A+T content plus the nature of the genetic 

70 code put clear constraints on the likllehood of occurence of any particular oligonucleotide sequence. Thus, 
a gene from E coli with a 50% A+T content is much less likely to contain any particular A+T rich 
segment than a gene from 0, thuringiensis. 

As described above, the expression of B.t toxin protein in plants has been problematic. Although the 
observations made in other systems described above offer the hope of a means to elevate the expression 

75 level of BX toxin proteins in ptents, the success obtained by the present method Is quite unexpected. 
Indeed, inasmuch as it has been recently reported that expression of the fulHength B,lH, toxin protein in 
tobacco makes callus tissue necrotic (Barton et al., 1987): one would reasonably expect that high level 
expression of B.t toxin protein to be unattainable due to the reported toxicity effects. 

In its most rigorous application, the method of the present invention involves tfie modification of an 

20 existing structural coding sequence ("stnjcturai gene") which codes for a particular protein by removal of 
ATTTA sequences and putative polyadenylation signals by site directed mutagenesis of the DNA compris- 
ing the structural gene, it is most preferred that substantially all the polyadenylation signals and ATTTA 
sequences are removed although enhanced expression levels are observed with only partial removal of 
either of the above identified sequences. Alternately if a synthetic gene is prepared which codes for the 

25 expression of the subject protein, codons are selected to avoid the ATTTA sequence and putative 
polyadenylation signals. For purposes of the present Invention putative polyadenylation signals include, but 
are not necessarily limited to. AATAAA. AATAAT. AACCAA. ATATAA. AATCAA, ATACTA ATAAAA. 
ATGAAA. AAGCAT. ATTAAT, ATACAT, AAAATA. ATTAAA. AATTAA. AATACA and CATAAA. In replacing 
the ATTTA sequences and polyadenylation signals, codons are preferably utilized which avoid the codons 

30 which are rarely found In plant genomes. 

Another embodiment of the present invention, represented in tiie ftow cfiagram of Figure 1, employs a 
method for the modification of an existing structural gene or altemately tiie de novo synthesis of a 
structural gene which method is somewhat less rigorous than the method first described above. Rofenring to 
Rgure 1, the selected DNA sequence is scanned to identify regions with greater ttian four consecutive 

35 adenine (A) or thymine (T) nucleotides. The A+T regions are scanned for potential plant polyadenylation 
signals. Although the absence of five or more consecutive A or T nucleotkJes eliminates most plant 
polyadenylation signals, If there are more than one of the minor polyadenylation signals identified witiiin ten 
nucleotides of each other, then tiie nucleotide sequence of this region is preferably altered to remove ttiese 
signals while maintaining the original encoded amino acid sequence. 

40 The second step is to consider the 15 to 30 nucleotide regions surrounding the A+T rich region 
identified in step one. If the A + T content of tiie sun'ounding region is less than 80%, the region should be 
examined for polyadenylation signals. Alteration of the region based on polyadenylation signals is deperv 
dent upon (1) the number of polyadenylation signals present and (2) presence of a major plant polyadenyla- 
tion signal. 

45 The extended region is examined for tiie presence of plant polyadenylation signals. The polyadenyla- 
tion signals are removed by site-directed mutagenesis of the DNA sequence. The extended regton is also 
examined for multiple copies of the ATTTA sequence which are also removed by mutagenesis. 

It is also preferred that regions comprising many consecutive A+T bases or G + C bases are disrupted 
since these regions are predicted to have a higher likelihood to form hairpin structure due to self- 

50 complementarity. Therefore, insertion of heterogeneous base pairs would reduce the likelihood of self- 
complementary secondary structure formation which are knov/n to inhibit transcription and/or translation in 
some organisms. In most cases, the adverse effects may be minimized by using sequences which do not 
contain more than five consecutive A + T or G+C. 

55 

SYNTHETIC OUGQNUCLEOTIDES FOR MUTAGENESIS 
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The oligonucleotides used in the mutagenesis are designed to maintain the proper amino acid 
sequence and reading frame and preferably to not introduce common restriction sites such as Bglll, Hindlll, 
Sad. KpnI. EcoRl. Ncol, Psll and Sail into the modified gene. These restriction sites are found in multi-linker 
insertion sites of cloning vectors such as plasmids pUCIIS and pMON725a Of course, the Introduction of 

5 new polyadenylation signals. ATTTA sequences or consecutive stretches of more than five A+T or Q + C. 
should also be avoided. The preferred size for the oligonucleotides is around 40-50 bases, but fragments 
ranging from 18 to 100 bases have been utilized. In most cases, a minimum of 5 to 8 base pairs of 
homology to the template DNA on both ends of the synthesized fragment are maintained to insure proper 
hybridization of the primer to the template. The oligonucleotides should avoid sequences longer than five 

70 base pairs A+T or Q + C. Codons used in the replacement of wild-type codons should preferably avoid the 
TA or CQ doublet wherever possible. Codons are selected from a plant prefered codon table (such as 
Table I below) so as to avoid codons which are rarely found in plant genomes, and efforts should be made 
to select codons to preferably adjust the G + C content to about 50%. 

75 



20 



25 



30 



35 



40 



45 



50 



55 
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PTf>f«>rr6d Cndon Osaae in Plants 



Percent Osage 



Amino Acid CoUon I.H Plants 

ARG CGA 7 

CGC 11 

CGG 5 

CGO 25 

AGA 29 

AGG 23 

LEO COA 8 

COC 20 

COG 10 

COO 28 

OOA 5 

OOG 30 

SER OCA 14 

OCC 26 

OCG 3 

OCO 21 

AGC 21 

AGO 15 

THR ACA 21 

ACC 41 

ACG 7 

AGO 31 

PRO CCA 45 

CCC 19 

CCG 9 

ceo 26 

ALA GCA 23 

GCC 32 

GCG 3 

GCO 41 



10 



EP 0 385 962 A1 

Table I - continued 

preferred Code n Daage in Plants 

Percent Usage 

am4nr> Aflid Cndon in Plants 

GLY GGA 32 

GGC 20 

GGG 11 

GGU 37 

ILE AUA 12 

AUG 45 

ADU 43 

VAL GUA 9 

GUC 20 

GOG 28 

GOO 43 

LYS AAA 36 

AAG . 64 

ASN AAC 72 

AAU 28 

GLN CAA 64 

GAG 36 

HIS CAC 65 

CAU 35 

GLO GAA 48 

GAG 52 

ASP GAC 48 

GAD 52 

TYR UAC 68 

UAU 32 

CYS UGC 78 

OGO 22 
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Table I • continued 
Prefer rgd Codon n.gaqft In Plants 

5 Percent Osage 
Amino Acid Cfidon in giants 

PHE UUC 56 

UUO 44 

10 

MET AUG 100 

TRP U6G 100 



75 Regions with many consecutive A+T bases or G + C bases are predicted to have a higher likelihood to 
form hairpin structures due to self-complementarity. Disruption of these regions by the insertion of 
heterogeneous base pairs is prefenred and should reduce the likelihood of the formation of self-complemen- 
tary secondary structures such as hairpins which are known in some organisms to inhibit transcription 
(transcriptional terminators) and translation (attenuators). However, it is difficult to predict the biotogicai 

20 effect of a potential hairpin forming region. 

It Is evident to those skilled in the art that while the above description is directed toward the 
modification of the DNA sequences of wild-type genes, the present method can be used to construct a 
completely synthetic gene for a given amino acid sequence. Regions with five or more consecutive A+T or 
G-f-C nucleotides should be avoided. Codons should be selected avoiding .the TA and CG doublets in 

25 codons whenever possible. Codon usage can be normalized against a plant preferred codon usage table 
(such as Table f) and the G+C content preferably adjusted to about 50%. The resulting sequence should 
be examined to ensure that there are minimal putative plant polyadenylation signals and ATTTA sequences. 
Restriction sites found in commonly used cloning vectors are also preferably avoided. However, placement 
of several unique restrictfon sites throughout the gene is useful for analysis of gene expressioh or 

30 construction of gene variants. 



Plant Gene Construction 

35 The expression of a plant gene which exists in double-stranded DNA form involves transcription of 
messenger RNA (mRNA) from one strand of the DNA by RNA polymerase enzyme, and the subsequent 
processing of the mRNA primary transcript inside the nucleus. This processing involves a 3 non-translated 
region which adds polyadenylate nucleotides to the 3' end of the RNA. Transcription of DNA Into mRNA is 
regulated by a region of DNA usually refenred to as the "promoter." The promoter region contains a 

40 sequence of bases that signals RNA polymerase to associate with the DNA and to initiate the transcription 
of mRNA using one of the DNA strands as a template to make a corresponding strand of RNA 

A number of promoters which are active in plant cells have been described in the literature. These 
include the nopaline synthase (NOS) and octopine synthase (CCS) promoters (which are carried on tumor- 
inducing plasmids of Agrobacterium tumefaciens), the Cauiifiower Mosaic Virus (CaMV) 19S and 35S 

45 promoters, the light-inducible promoter from the small subunit of ribulose bis-phosphate carboxylase 
(ssRUBISCO, a very abundant plant polypeptide) and the mannopine synthase (MAS) promoter (Velten et 
al. 1984 and Velten & Schell. 1985). All of these promoters have been used to create various types of DNA 
constructs which have been expressed in plants (see e.g., POT publication WO84/02913 (Rogers et al.. 
Monsanto). 

so Promoters which are knov/n or are found to cause transcription of RNA in plant cells can be used in the 
present invention. Such promoters may be obtained from plants or plant viruses and include, but are not 
limited to, the CaMV35S promoter and promoters isolated from plant genes such as ssRUBISCO genes. As 
described below, it is preferred that the particular promoter selected should be capable of causing sufficient 
expression to result in the production of an effective amount of protein. 

55 The promoters used in the DNA constructs (i.e. chimerb plant genes) of the present invention may be 
modified, if desired, to affect their control characteristics. For example, the CaMV35S promoter may be 
ligated to the portion of the ssRUBISCO gene that represses the expression of ssRUBISCO in the absence 
of light, to create a promoter whk:h is active in leaves but not in roots. The resulting chimeric promoter may 



12 



EP 0 385 962 A1 



be used as described herein. For purposes of this description, the phrase "CaN4V35S" promoter thus 
includes variations of CaMV35S promoter, e.g., promoters derived by means of ligation with operator 
regions, random or controlled mutagenesis, etc. Furthermore, the promoters may be altered to contain 
multiple "enhancer sequences" to assist in elevating gene expression. 

5 The RNA produced by a DNA construct of the present invention also contains a s' non-translated leader 
sequence. This sequence can be derived from the promoter selected to express the gene, and can be 
specifically mocfified so as to increase translation of the mRNA. The s' non-translated regions can also be 
obtained from viral RNA's. from suitable eukaryotic genes, or from a synthetic gene sequence. The present 
invention is not limited to constructs, as presented in the following examples. Rather, the non-translated 

10 leader sequence can be part of the 5 end of the non-translated region of the coding sequence for the virus 
coat protein, or part of the promoter sequence, or can be derived from an unrelated promoter or coding 
sequence. In any case, it is preferred that the sequence flanking the initiation site conform to the 
translational consensus sequence rules for enhanced translation initiation reported by Kozak (1984). 

The DNA construct of the present invention also contains a modified or fully-synthetic structural coding 

15 sequence which has been changed to enhance the performance of the gene in plants. In a particular 
embodiment of the present invention the enhancement method has been applied to design modified and 
fully synthetic genes encoding the crystal toxin protein of Bacillus thuhngiensis. The structural genes of 
the present invention may optionally encode a fusion protein comprising an amino-tenminal chloroplast 
transit peptide or secretory signal sequence (see for instance. Examples 10 and 11). 

20 The DNA constmct also contains a 3' non-translated region. The 3 non-translated region contains a 
polyadenylation signal which functions in plants to cause the addition of polyadenylate nucleotides to the 3 
end of the viral RNA. Examples of suitable 3' regions are (1) the 3' transcribed, non-translated regions 
containing the polyadenylation signal of Agrobacterium tumor-inducing (Ti) plasmid genes, such as the 
nopaline synthase (NOS) gene, and (2) plant genes like the soybean storage protein (7S) genes and the 

25 small subunit of the RuBP carboxylase (E9) gene. An example of a prefened 3 region is that from the 7S 
gene, described in greater detail in the examples below. 



Plant Transformation 

30 

A chimeric plant gene containing a structural coding sequence of the present invention can be inserted 
into the genome of a plant by any suitable method. Suitable plants for use in the practice of the present 
invention include, but are not limited to, soybean, cotton, alfalfa, oilseed rape, flax, tomato, sugarbeet, 
sunflower, potato, tobacco, maize, rice and wheat. Suitable plant transformation vectors include those 

35 derived from a Ti plasmid of Agrobacterium tumefaciens, as well as those disclosed, e.g., by Herrera- 
Estrella (1983), Bevan (1933), Klee (1985) and EPO publication 120,516 (Schllperoort et al.). In addition to 
plant transformation vectors derived from the Ti or root-inducing (Ri) plasmids of Agrobacterium, aKer- 
native methods can be used to insert the DNA constructs of this invention into plant cells. Such methods 
may involve, for example, the use of liposomes, electroporation, chemicals that increase free DNA uptake. 

40 free DNA delivery via microprojectile bombardment, and transformation using viruses or pollen. 

A particularly useful Ti plasmid cassette vector for Iransformation of dicotyledonous plants is shown in 
Rgure 5. Refemng to Figure 5, the expression cassette pMON893 caisists of the enhanced CaMV35S 
promoter (EN 358) and the 3' end including polyadenylation signals from a soybean gene encoding the 
alpha-prime subunit of beta-conglycinin. Between these two elements is a multilinker containing multiple 

45 restriction sites for the insertion of genes. 

The enhanced CaMV35S promoter was constructed as follows. A fragment of the CaMV35S promoter 
extending between position -343 and +9 was previously constructed in pUC13 by Odell et al. (1985). This 
segment contains a region identified by Odell et al. (1985) as being necessary for maximal expression of 
the CaMV35S promoter. It was excised as a Olal-Hindlll fragment, made blunt ended with DNA polymerase 

50 I (Klenow fragment) and inserted into the Hindi site of pUC18. This upstream region of the 358 promoter 
was excised from this plasmid as a Hindlll- EcoRV fragment (extending from -343 to -90) and inserted into 
the same plasmid bebween the Hindlll and PstI sites. The enhanced CaMV35S promoter thus contains a 
duplication of sequences between -343 and -90 (Kay et al., 1987). 

The 3' end of the 78 gene is derived from the 7S gene contained on the clone designated 17.1 

55 (Schuler et al.. 1982). This 3' end fragment, which includes the polyadenylation signals, extends from an 
Avail site located atwHJt 30 bp upstream of the termination codon for the beta-conglycinin gene in clone 
17.1 to an EcoRl site located about 450 bp downstream of this termination codon. 

The remainder of pMON893 contains a segment of pBR322 which provides an origin of replication in E. 
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coil and a region for homologous recombination with the disanmed T-DNA in Agrobacterium strain AGO 
(described below); the onV region from the broad host range plasmid RK1; the streptomycin/sF>ectinomycin 
resistance gene from Tn7; and a chimeric NPTII gene, containing the CaMV35S promoter and the nopaline 
synthase (NOS) 3' end, which provides icanamycin resistance in transformed plant cells. 

5 Referring to Rgure 6, transformation vector plasmid pMONSOO is a derivative of pMON893. The 
enhanced CaMV35S promoter of pMON893 has been replaced with the 1.5kb mannop«n© synthase (MAS) 
promoter (Veiten et al. 1984). The other segments are the same as plasmid pMON893. After incorporation 
of a DNA construct into plasmid vector pMON893 or pMON900. the intermediate vector is introduced into 
A. tumefaciens strain AGO which contains a disarmed Ti plasmid. Cointegrate Ti plasmid vectors are 

70 selected and used to transform cficotyledonous plants. 

Referring to Rgure 7. A tumefaciens ACQ is a disarmed strain similar to pTiBBSE described by Fraley 
et al (1985). For construction of AGO the starting Agrobacterium strain was the strain A208 which contains 
a nopaJine-type Ti plasmid. The Ti plasmid was disanmed in a manner similar to that described by Fraley et 
al. (1985) so that essentially all of the native T-DNA was removed except for the left border and a few 

IS hundred base pairs of T-DNA inside the left border. The remainder of the T-DNA esctending to a point just 
beyond the right border was replaced with a novel piece of DNA including (from left to right) a segment of 
pBR322, the oriV region fnDm plasmid RK2, and the kanamycin resistance gene from TnSOl, The pBR322 
and oriV segments are similar to the segments in pMON893 and provide a region of homology for 
cointegrate formation. 

20 The following examples are provided to better elucidate the practice of the present invention and should 
not be interpreted in any way to limit the scope of the present invention. Those skilled in the art win 
recognize that various modifications, truncations etc. can be made to the methods and genes described 
herein while not departing from the spirit and scope of the present invention. 

25 

Example 1 - Modified B.tX HD-1 Gene 

Refenring to Rgure 2, the wild-type BJM. HD-1 gene is known to be expressed poorly in plants as a full 
length gene or as a truncated gene. The G + G content of the BJX gene is low (37%) containing many 
30 A+T rich regions, potential polyadenylation sites (18 sites; see Table II for the list of sequences) and 
numerous ATTTA sequences. 

Table II 



35 



List of Sequences of 


the Potential 


Polyadenylation 


Signals 


AATAAA* 


AAGCAT 


AATAAr 


ATTAAT 


AAGCAA 


ATAGAT 


ATATAA 


AAAATA 


AATCAA 


ATTAAA- 


ATACTA 


AATTAA" 


ATAAAA 


AATAGA" 


ATGAAA 


CATAAA*- 



• indicates a potential major plant polyadenylation site. 

indicates a potential minor animal polyadenytetlon site. 
All others are potential 
minor plant 
polyadenylation sites. 



55 

Table III lists the synthetic oligonucleotides designed and synthesized for the site-directed mutagenesis 
of the BJ.k, HD-1 gene. 
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Table III 



Mtif agene sis Primers for B.t.k. HD-1 Gene 



Primer 
BTK185 
BTK240 



Length (bp) 5e<?uence 



18 



48 



TCCCCAGATA ATATCAAC 

GGCTTGATTC CTAGCGAACT 
CTTCGATTCT CTGGTTGATG 
AGCTGTTC 



BTK462 



54 



CAAAACTGAG AGGTGGAGGT 
TGGCAGCTTG AACGTACACG 
GAGAGGAGAGGAAC 



BTK669 



48 



AGTTAGTGTA AGCTCTCTTC 
TGAACTGGTT GTACCTGATC 
CAATCTCT 



BTK930 



39 



AGCCATGATC TGGTGACCGG 
ACCAGTAGTA TTCTCCTCT 



BTKlllO 



32 



AGTTGTTGGT TGTTGATCCC 
GATGTTAAAA GG 
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Table III - continued 
TfiH-aye^nesifl Primers for B.t.k. HD-1 Gene 

Primer Length fbp) Se<31ience 



TO 



BTK1380A 



37 



GTGATGAAGG GATGATGTTG 
TTGAACTCAG CACTACG 



75 



20 



BTK1380T 



100 



CAGAAGTTCC AGAGCCAAGA 
TTAGTAGACT TGGTGAGTGG 
GATTTGGGTG ATTTGTGATG 
AAGGGATGAT GTTGTTGAAC 
TCAGCACTAC GATGTATCCA 



BTK1600 



27 



25 



30 



35 



40 



45 



50 



55 



TGATGTGTGG AACTGAAGGT 
TTGTGGT 



The B.tk. HD-1 gene (Bglll fragpment from pMON9921 encoding ammo acids 29-607 with a Met-Ala at 
the N-terminus) was cloned into pMON7258 (pUCllS derivative which contains a Bglll site in the multilinker 
cloning, region) at the Bglll site resulting in pM0N5342. The orientation of the B.tk, gene was chosen so 
that the opposite strand (negative strand) was synthesized in filamentous phage particles for the 
mutagenesis. The procedure of Kunkle (1985) was used for the mutagenesis using piasmid pMON5342 as 
starting material. 

The regions for mutagenesis were selected in the following manner. AO regions of the DNA sequence of 
the B.tk. gene were identified which contained five or more consecutive iDase pairs which were A or T. 
These were ranked in terms of length and highest percentage of A +T in the surrounding sequence over a 
20-30 base pair region. The DNA was then analysed for regions which might contain polyadenylation sites 
(see Table II above) or ATTTA sequences. Oligonucleotides were designed which maximized the elimina- 
tion of A+T consecutive regions which contained one or more polyadenylation sites or ATTTA sequences. 
Two potential plant polyadenylation sites were rated more critical (see Table 11) based on published reports, 
Codons were selected which increased G + C content, did not generate restriction sites for enzymes useful 
for cloning and assembly of the modified gene (BamHl, Bglll. Sacl. Ncol. EcoRV) and did not contain the 
doublets TA or GC which have been reported to be infrequentiy found in codons in plants. The 
oligonucleotides were at least 18 bp long ranging up to 100 base pairs and contained at least 5-8 base pairs 
of direct homology to native sequences at the ends of the fragments for efficient hybridization and priming 
in site-directed mutagenesis reactions. Figure 2 compares the wild-type B.tk. HD-1 gene sequence with the 
sequence which resulted from the modifications by site-directed mutagenesis. 

The end result of these changes was to increase the G + C content of B.tk. gene from 37% to 41% 
while also decreasing the potential plant polyadenylation sites from 18 to 7 and decreasing the ATTTA 
regions from 13 to 7. Specifically, the mutagenesis changes from amino (5 ) terminus to the carboxy (3 ) 
terminus are as follows: 

BTK185 is an 18-mer used to eliminate a plant polyadenylation site in the midst of a nine base pair 
region of A + T. 

BTK240 is a 48-mer. Seven base pairs were changed by this oligonucleotide to eliminate three potential 
polyadenylation ates (2 AACCAA. 1 AATTAA). Another region close to the region altered by BTK240. 
starting at bp 312, had a high A + T content (13 of 15 base pairs) and an ATTTA region. However, it did not 
contain a potential polyadenylation site and its longest stiing of uninterrupted A + T was seven base pairs. 

BTK462 is a 54-mer introducing 13 base pair changes. The first six changes were to reduce the A + T 
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richness of the gene by replacing wild-type codons witli codons containing G and C while avoiding the CG 
doublet The next seven changes made by BTK462 were used to eliminate an A+T rich region (13 of 14 
base pairs were A or T) contaning two ATTTA regions. 

BTK669 is a 48-mer making nine individual base pair changes eliminating three possible polyadenyla- 

5 tion sites (ATATAA» AATCAA, and AATTAA) and a single ATTTA site. 

BTK930 is a 39-mer designed to increase the G + C content and to eb'minate a potential polyadenylation 
site (AATAAT - a major site). This region did contain a nine base pair region of consecutive AT sequence. 
One of the base pair changes was a G to A because a G at this position would have created a G + C rich 
region (CCGG(G)C). Since sequencing reactions indicate that there can be difficulties generating sequence 

70 through Q + G consecutive bases, it was thought to be prudent to avoid generating potentially problematic 
regions even if they were problematic only in vitro. 

BTK1110 is a 32-mer designed to introduce live changes in the wild-type gene. One potential site 
(AATAAT - a major site) was eliminated in the midst of an A+T rich region (19 of 22 base pairs). 

BTK1380A and BTK1380T are responsible for 14 individual base pair changes. The first region (1380A) 

75 has 17 consecutive A+T base pairs. In this region Is an ATTTA and a potential polyadenylation site 
(AATAAT). The 100-mer (1380T) contains all the changes dictated by 13a)A. The large size of this primer 
was in part an experiment to determine if it was feasible io utilize large oligonucleotides for mutagenesis 
(over 60 bases in length). A second consideration was that the 100-mer was used to mutagenize a template 
which had previously been mutageneized by 1380A. The original primer ordered to mutagenize the region 

20 downstream and adjacent to 138DA did not anneal efficiently to the desired site as indicated by an inability 
to obtain clean sequence utilizing the primer. The large region of homology of 1380T did assure proper 
annealing. The extended size of 1380T was more of a convenience rather than a necessity. The second 
region adjacent to 1380A covered by 1380T has a high A+T content (22 of 29 bases are A or T). 

BTK1600 is a 27-mer responsible for five individual base pair changes. An ATTTA region and a plant 

25 polyadenylation site were identified and the appropriate changes engineered. 

A total of 62 bases were changed by site-directed mutagenesis. The G + C content increased by 55 
base p>airs, the potential polyadenylation sites were reduced from 18 to seven and the ATTTA sequences 
decreased from 13 to seven. The changes in the DNA sequence resulted In changes in 55 of the 579 
codons In the truncated BJX gene in pMON5342 (approximately 9.5%). 

30 Referring to Table IV modified BXk, HD-1 genes were constructed that contained all of the above 
modifications (pMON5370) or various subsets of individual modifications. These genes were inserted into 
pMON893 for plant transfomnation and tobacco plants cont^nlng these genes were analyzed. The analysis 
of tobacco plants with the individual modifications was undertaken for several reasons. Expression of the 
wild type truncated gene in tobacco is very poor, resulting in infrequent identification of plants toxic to 

35 THW. Toxicity Is defined by leaf feeding assays as at least 60% mortality of tobacco homworm neonate 
larvae with a damage rating of 1 or less (scale is 0 to 4; 0 is equivalent to total protection, 4 totai damage). 
The modified HD-1 gene (pMON5370) shows a large increase in expression (estimated to be approximately 
100-fold; see Table VIII) in tobacco. Therefore, increases in expression of the wild-type gene due to 
indtdvidual modifications would be apparently a large increase in the frequency of toxic tobacco plants and 

40 the presence of detectable B.tk. protein. Results are shown in the following table: 

Table IV 



Relative effects of Regional Modifications within the B.tX Gene 


Construct 


Position Modified 


#of 


# of Toxic 






Plants 


Plants 


pMON5370 


185. 240, ^9. 930, 1110. 1380a + b, 1800 


38 


22 


pMON10707 


185,240.462.669 


48 


19 


pMON10706 


930. 1110. 1380a + b. 1600 


43 


1 


pMON10539 


185 


55 


2 


pMON10537 


240 


57 


17 


pMON10540 


185, 240 


88 


23 


PMON10705 


462 


47 


1 



The effects of each individual oligonucleotides' changes on expression did reveal some overall trends. 
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Six different constructs were generated which were designed to identify the key regions. The nine different 
oligonucleotides were divided in half by their position on the gene. Changes in the N-terminal half were 
incorporated into pMONl0707 (185.240. 462.689). C-terminal half changes were Incorporated Into 
pMON10708 (930.1 11 0.1 380a + b,1 600). The results of analysis of plants with ttiese two constructs indicate 

5 that pMON10707 produces a substantial number of toxic plants (19 of 48). Protein from these plants is 
detectable by ELISA analysis. pMON10706 plants were rarely identified as insecticldal (1 of 43) and the 
levels of B.f./c. were barely detectable by immunological analysis. Investigation of the N-terminal changes in 
greater detail was done with 4 pMON constructs; 10539 (185 alone), 10537 (240 alone), 10540 (185 and 
240) and 10705 (462 alone). The results indicate that the presence of the changes in 240 were required to 

10 generate a substantial number of toxic plants (pMON10540; 23 of 88. pMONl0537; 17 of 57). The absence 
of the 240 changes resulted in a low frequency of toxic plants with low BXk, protein levels, identical to 
results with the wild type gene. These results indicate that the changes in 240 are responsible for a 
substantial increase in BJX expression levels over an analogous wild-type construct in tobacco. Changes in 
additional regions (185.462.669) in conjunction with 240 may result in increases in B.tK expression (>2 

75 fold). However, changes at the 240 region of the N-terminal portion of the gene do result in dramatic 
increases in expression. 

Despite the importance of the alteration of the 240 region in expresdon of modified genes, increased 
expression can be achieved by alteration of other regions. Hybrid genes, part wild-type, part synthetic were 
generated to determine the effects of synthetic gene segments on the levels of B,tX expression. A hybrid 

20 gene was generated with a synthetic N-terminal third (base pair 1 to 590 of Figure 2: to the Xbal site) witfi 
the C-tenninal wild type BJX HD-1 (pMON5378) Plants transformed with this vector were as toxic as plants 
transfomried with the modified HD-1 gene (pMON5370). This is consistent with the alteration of the 240 
region. However. pMON10538. a hybrid with a wild-type N-terminal third (wild type gene for the first 600 
base pairs, to the second Xbal site) and a synthetic C-terminal last two-thirds (base pair 590 to 1845 of 

25 Rgure 3 was used to transform tobacco and resulted in a dramatic increase in expression. The levels of 
expression do not appear to be as high as those seen with the synthetic gene, but are comparable to the 
modified gene levels. These results indicate that modification of the 240 segment is not essential to 
increased expression since pMON10538 has an intact 240 region. A fully synthetic gene is. in most cases, 
superior for expression levels of BJ,k. (See Example 2.) 

30 

Example 2 -- Fully Synthetic BJX HD-1 Gene 

A synthetic B.tk, HD-1 gene was designed using ihe preferred plant codons listed in Table V below. 
35 Table V lists the codons and frequency of use in plant genes of dicotyledonous plants compared to the 
frequency of their use in the wild type BJX. HD-1 gene (amino acids 1-615) and the synthetic gene of this 
example. The total number of each amino acid in this segment of the gene is listed in the parenthesis under 
the amino acid designated. 

40 



45 



50 



55 
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Table V 



Porinn in n^a^af Svn1-hPl-ig B.t.k. HP-l Gene 



Percent Usage in 
Amino Acid Codon PlflTltfl/Wt. f?. ft ./c. /SYH 







7 


11 


2 


(43) 


CGC 


11 


5 • 


5 




CGG 


5 


2 


0 






25 


14 


27 








55 


41 








"14 


25 


LEU 




o 
o 


16 


4 


(49) 


cue 


20 


0 






CUG 


10 


2 


6 




CUU 


28 


22 


24 




UUA 


5 


SO 


0 




UUG 


30 


10 


45 


SER 


UCA 


14 


27 


5 


(64) 


UCC 


26 


9 


28 




UCG 


3 


8 


0 




DCU 


21 


19 


31 




AGC 


21 


6 


32 




AGU 


15 


31 


5 



19 
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Table V - continued 



rnrinn tn nsaqf. Svnthgf. i c B.t.k. HD-1 Gene 



Percent Usage in 
Amine Acid radon Plants/Wt ft.fT.fr./Svn 



THR 


ACA 


21 


31 


14 


(42) 


ACC 


41 


19 


53 




ACG 


7 


14 


0 




ACO 


31 


36 


33 


PRO 


CCA 


45 


35 


53 


(34) 


CCC 


19 


6 


12 




CCG 


9 


21 


3 




ecu 


26 


38 


32 


ALA 


GCA 


23 


38 


26 


(31). 


GCC 


32 


9 


29 




GCG 


3 


3 


0 




GCU 


41 


50 


45 


GLY 


GGA 


32 


52 


45 


(46) 


GGC 


20 


17 


15 




GGG 


11 


15 


6 




GGU 


37 


15 


34 


ILE 


AUA 


12 


39 


2 


(46) 


AUC 


45 


11 


67 




AUU 


43 


50 


30 



20 
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Table V - continued 



Cndan in nsaae Synthetiic R.t.k. HD-1 Gene 



Percent Usage in 
Aminn Acid Codon Plant3^Wt fltC/c/Svn 



VAL 


GUA 


9 


45 


3 


(38) 


cue 


20 


5 


16 




GOG 


28 


11 


37 




GUO 


43 


39 


45 


LYS 


AAA 


36 


100 


33 


(3) 


AAG 


64 


0 


67 


ASM 


AAC 


72 


27 


80 


(44) 


AAU 

• 


28 


73 


20 


GLN 


CAA 


64 


77 


61 


(31) 


CAG 


36 


23 


39 


HIS 


CAC 


65 


0 


80 


(10) 


CAO 


35 


100 


20 


6LU 


6AA 


48 


87 


50 


(30) 


GAG 


52 


13 


50 


ASP 


GAC 


48 


17 


65 


(23) 


GAU 


52 


83 


35 



21 
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Table V - continued 



70 



J5 



20 



25 



30 



rorinn in Usage Svn l-hPfic B.t.k. HD-1 Gene 



Percent Usage in 
Amino Acid Mqu P'lants/fft g,f:./c./SYn 



TYR OAC 68 20 72 

(25) UAO 32 80 28 

CYS UGC 78 50 100 

(2) UGO 22 50 0 

PHE UUC 56 17 83 

(36) aUU 44 83 17 

MET AUG 100 100 100 

(9) 

TRP UGG 100 100 100 

(9) 

35 The resulting synthetic gene lacks ATTTA sequences, contains only one potential polyadenylation site 
and has a G + C content of 48.5%. Figure 3 is a comparison of the wild-type HD-1 sequence to the 
synthetic gene sequence for amino acids 1-615. There is approximately 77% DNA homology t)etween the 
synthetic gene and the wild-type gene and 356 of the 615 codons have been changed (approximately 
60%). 

40 

Example 3 - Synthetic BJX H!>73 Gene 

The crystal protein toxin from B.t/f. HD-73 exhibits a higher unit activity against, some important 
45 agricultural pests. The toxin protein of HD-1 and HD-73 exhibit substantial homology (-90%) in the N- 
terminal 450 amino adds, but differ substantially in the amino add region 451-615. Fusion proteins 
comprising amino acids 1-450 of HD-1 and 451-615 of HD-73 exhibit the insecticidal properties of the wild- 
type HD-73. The strategy employed was to use the 5'-two thirds of the synthetic HD-1 gene (first 1350 
bases, up to the Sad site) and to dramatically modify the final. 590 bases (through amino add 645) of the 
50 HD-73 in a manner consistent with the algorithm used to design the synthetic HD-1 gene. Table VI jDelow 
lists the oligonucleotides used to modify the HD-73 gene in the order used in the gene from 5 to 3 end. 
Nir^ oligonucleotides were used in a 590 base pair region, each nucleoticte ranging in size from 33 to 60 
bases. The only regions left unchanged were areas where there were no long consecutive strings of A or T 
bases (longer than six). All polyadenylation sites and ATTTA sites were eliminated. 

55 



22 
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50 



Table VI 



Mnf agenesis Primers for B.t.k. HD-73 



70 



75 



20 



25 



30 



35 



40 



Primer 

73K1363 



73K1437 



73K1471 



73K1561 



73K1642 



73K1675 



51 



33 



48 



60 



33 



48 



AATACTATCG GATGCGATGA 
TGTTGTTGAA CTCAGCACTA 
CGGTGTATCC A 

TCCTGAAATG ACAGAACCGT 
TGAAGAGAAA GTT 

ATTTCCACTG CTGTTGAGTC 
TAACGAGGTC TCCACCAGTG 
AATCCTGG 

GTGAATAGGG GTCACAGAAG 
CATACCTCAC ACGAACTCTA 
TATCTGGTAG ATGTTGGATGG 

TGTAGCTGGA ACTGTATTGG 
AGAAGATGGA TGA 

TTCAAAGTAA CCGAAATCGC 
TGGATTGGAG ATTATCCAAG 
GAGGTAGC 



45 



73K1741 



39 



ACTAAAGTTT CTAACACCCA 
CGATGTTACC GAGTGAAGA 



55 
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Table VI - continued 
^xl^aaeMsi 9i Primers for B.fc.Jc. HD-73 
Primer IiftTiqth (bff) SftqA3encs 

73K1797 36 AACTGGAATG AACTCGAATC 

TGTCGATAAT CACTCC 

73KTERM 54 GGACACTAGA TCTTAGTGAT 

AATCGGTCAC ATTTGTCTTG 
AGTCCAAGCT GGTT 

The resulting gene has two potential polyadenylation sites (compared to 18 in the WT) and no All i A 
sequence (12 in the WT). The G + C content has increased from 37% to 48%. A total of 59 individual base 
pair changes were made using the primers in Table VI. Overall, there is 90% DMA homotogy between the 
region of the HD-73 gene modified by site directed mutagenesis and the wild-type sequence of the 
analogous region of HD-73. The synthetic HD-73 is a hybrid of the first 1360 bases from the synthetic HD-1 
and the next 590 bases or so modified HD-73 sequence. IHgure 4 is a comparison of the above-described 
synthetic B.tk. HD-73 and the wild-type B.tX HD-73 encoding amino acids 1-645. In the modified region of 
the HD-73 gene 44 of the 170 codons (25%) were changed as a result of the site-directed mutagenesis 
changes resulting from the oligonucleotides found in Table VI. Overall, approximately 50% of the codons in 
the synthetic BXk, HD-73 differ from the analogous segment of the wild-type and HD-73 gene. 

A one base pair deletion in the synthetic HD-73 gene was detected in the course of sequencing the 3 
end at base pair 1890. This results in a frame-shift mutation at amino acid 625 with a premature stop codon 
at amino acid 640 (pIWON5379). Table VII below compares the codon usage of the wild-type gene of at/f. 
HD-73 versus the synthetic gene of this example for amino acids 451-645 and codon usage of naturally 
occurring genes of dicotyledonous plants. The total number of each amino acid encoded in this segment of 
the gene is found in the parentheses under the amino acid designation. 

Table VII 

codon Usage in <=;vnt:hetic B.t.k. HD-73 Gene 



' Percent Usage in 
Amino Acid Codon Plants/^t HP-73/SYn 



AUG 


CGA 


7 


10 


'0 


(10) 


CGC 


11 


0 


8 




CG6 


5 


10 


0 




CGU 


25 


20 


23 




AGA 


29 


60 


62 




AGG 


23 


0 


8 



24 
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Table VII - continued 
Cndnn Osaa f* in Svnthgtig B.t.k. HD-73 Gene 



Percent Usage in 

J^minn Acid f^odon Plant^g/Wt HD-73 /Svn 



LEU 


CtJA 


8 


25 


8 


(12) 


cue 


20 


17 


• 58 




CUG 


10 


17 


8 




CUU 


28 


8 


0 




UUA 


5 


33 


8 




UUG 


30 


0 


17 


SER 


OCA 


14 


24 


18 


(21) 


OCC 


26 


10 


27 




•0C6 


3 


10 


0 




UCU 


21 


24 


18 




AGC 


21 


0 


14 




AGU 


15 


33 


23 


THR 


ACA 


21 


47 


38 


(15) 


ACC 


41 


13 


31 




ACG 


7 


13 


0 




ACU 


31 


27 


31 


PRO 


CCA 


45 


71 


71 


(7) 


CCC 


19 


0 


0 




CCG 


9 


14 


0 




ecu 


26 


14 


29 



25 
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Table VII - continued 

r.ndnn Hsa aA in Synt-hartc B.t.k. HD-73 Gehe 



Percent Usage in 

Amino Acid ^ndon PTants/Wt HD-73/Svn 



ALA 


GCA 


23 


29 


31 


(14) 


GCC 


32 


1 


o 
0 




GCG 


3 


21 


15 




GCU 


41 


43 


46 


GLY 


GGA 


32 


33 


43 


(15) 


GGC 


20 


0 


0 




GG6 


11 


27 


14 




GGU 


37 


40 


43 


. ILE 


AOA 


12 


33 


7 


(15) 


AUG 


45 


7 


40 




ADO 


43 


60 


53 


VAL 


GUA 


9 


40 


7 


(15) 


GUC 


20 


0 


7 




GUG 


28 


20 


36 




GOU 


43 


40 


50 


LYS 


AAA 


36 


67 


100 


(3) 


AAG 


64 


33 


0 


ASN 


AAC 


72 


20 


53 


(20) 


AAU 


28 


80 


47 



26 
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Table VII - continued 
r^nn nsaae in Synthetic B.t k m)-73 (rftnft 

Percent Usage in 

AminQ Acid Codon PlflHtS/Wt HP-73/SYIl 

GLN CAA 64 60 67 

(5) CAG 36 40 33 

HIS CAC 65 67 100 

(3) CAD 35 33 0 

GLU GAA 48 86 57 

(7) GAG 52 14 43 

ASP GAC 48 40 50 

(5) GAU 52 60 50 

TYR UAC 68 0 20 

• (5) UAU 32 100 80 

CYS UGC 78 0 0 

(0) UGD 22 0 0 

• 

PHE UUC 56 8 67 

(13) UUU 44 92 33 

MET AUG 100 100 100 

(2) 

TRP UGG 100 100 100 

(2) 

Another truncated synthetic HD-73 gene was constructed. The sequence of this synthetic HD-73 gene 
is identical to that of the above synthetic HD-73 gene in the region in which they overlap (amino acids 29- 
615). and it also encodes Met-Aia at the N-tenminus. Rgure 8 shows a comparison of this truncated 
synthetic HD-73 gene with the N-terminal Met-A!a versus the wild-type HD-73 gene. 

While the previous examples have been directed at the preparation of synthetic and modified genes 
encoding truncated B,tM. proteins, synthetic or modified genes can also be prepared which encode full 
length toxin proteins. 

One full length BJX gene consists of the synthetic HD-73 sequence of Rgure 4 from nucleotide 1-1845 
plus wild-type HD-73 sequence encoding amino acids 616 to the C-terminus of the native protein. Rgure 9 
shows a comparison of this synthetic/wild-type full length HD-73 gene versus the wild-type full length HD-73 

27 



EP 0 385 962 A1 



Another full length B.tX gene consists of the synthetic HD-73 sequence of Rgure 4 from nucleotide 1- 
1845 plus a modified HD-73 sequence ending amino acids 816 to the C-terminus of the native protein. The 
C-terminal portion has been modified by site-directed mutagenesis to remove putative polyadenylation 

5 signals and ATTTA sequences according to the algorithm of Rgure 1 . Figure 10 shows a comparison of this 
synthetic/modified fuil length HD-73 gene versus the wild-type full length HD-73 gene. 

Another full length BJX gene consists of a fully synthetic HD-73 sequence which incorporates the 
synthetic HD-73 sequence of Rgure 4 from nucleotide 1-1845 plus a synthetic sequence encoding amino 
acids 616 to the C-terminus of the native protein. The C-terminal synthetic portion has been designed to 

70 eliminate putative polyadenylation signals and ATTTA sequences and to Include plant preferred codons. 
Rgure 11 shows a comparison of this fully synthetic full length HD-73 gene versus the wild-type full length 
HD-73 gene. 

Alternatively, another full length BJX gene consists of a fully synthetic sequence comprising base pairs 
M830 of 9,tM. HD-1 (Figure 3) and base pairs 1834-3534 of BJX HD-73 (Rgure 11). 



Example 4 - Expression of Modified and Synthetic BXk HD-1 and Synthetic HD-73 

A number of plant transformation vectors for the expression of BJX genes were constmcted by 
20 incorporating the structural coding sequences of the previously described genes into plant transformation 
cassette vector pMON893. The respective intemiediate transformation vector is in^rted into a suitable 
disarmed Agrobacterium vector such as A. tumefaciens ACQ. supra. Tissue explants are cocultured vvith 
the disarmed Agrobacterium vector and plants regenerated under selection for kanamycin resistance using 
known protocols: tobacco (Horsch et al.. 1985); tomato (McCormick et al.. 1986) and cotton (Troiinder et ah. 
25 1987). 



a) Tobacco. 

30 The level of BXk. HD-1 protein in transgenic tobacco plants containing pMON9921 (wild type 
truncated). pMON5370 (modified HD-1, Example 1, Rgure 2) and pM0N5377 (synthetic HD-1, Example 2. 
Rgure 3) were analyzed by Westem analysis. Leaf tissue was frozen in liquid nitrogen, ground to a fine 
powder and then ground in a 1:2 (wtrvolume) of SDS-PAGE sample buffer. Samples were frozen on dry ice, 
then incubated for 10 minutes in a boiling water bath and microfuged for 10 minutes. The protein 

35 ccxicentration of the supernatant was determined by the method of Bradford (Anal. Biochem. 72:248-254). 
Rfty ug of protein was run per lane on 9% SDS-PAGE gels, the protein transferred to nitrocellulose and the 
BJX HD-1 protein visualized using antibodies produced against BJX. HD-1 protein as the primary antibody 
and alkaline phosphatase conjugated second antibody as described by the manufacturer (Promega. 
Madison, WO- Purified HD-1 tryptic fragmerrt was used as the control. Whereas the BJX protein from 

40 tobacco plants containing pMON9921 was below the level of detection, the BJX, protein from plants 
containing the modified (pMON5370) and synthetic (pMON5377) genes was easily detected. The BJX 
protein from plants containing pMON9921 remained undetectable, even with 10 fold longer incubation 
times. The relative levels of BJX. HD-1 protein in these plants is estimated in Table Vlll. Because the 
protein from plants containing pMON9921 was not observed, the level of protein in these plants was 

45 estimated from the relative mRNA levels (see below). Plants containing the modified gene (pWON5370) 
expressed approximately 100 fold more BXk. protein than plants containing the wild-type gene 
(pMON9921). Plants containing the fully synthetic B.tk. HD-l gene (pMON5377) expressed approximately 
five fold more protein than plants containing the modified gene. The modified gene contributes the majority 
of the increase in B.tk. expression observed. The plants used to generate the above data.are the best 

50 representatives from each construct based either on a tobacco homworm bloassay or on data derived from 
previous Westem analysis. 
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Table VIII 



Expression of B,tX HD-1 Protein in Transgenic Tobacco 


Gene 


Vector 


B,tX Protein* 


Fold Increase in 


Description 




Concentration 


B,tX Expression 


Wiid type 


pMON9921 


10 


1 


Modified 


PMON5370 


1000 


100 


Synthetic 


pM0N5377 


5000 


500 



* B.tk, protein concentrat'ons are expressed in ng/mg of total soluble protein. The 
level of B.tk. protein for plants containing the wild type gene are estimated from 
RNA levels. 



Plants containing tfiese genes were tested for bioactivity to determine whether the increased quantities 
of protein observed by Western analysis result in a corresponding increase in bioactivity. Leaves from the 
same plants used for the V/estern data in Table 1 were tested for bioactivity against two insects. A 
detached leaf bioassay was first done using tobacco homworm, an extremely sensitive lepidopteran insect 
Leaves from all three transgenic tobacco plants were totally protected and 100% mortality of tobacco 
homwonn observed (see Table IX below). A much less sensitive insect, beet anfnywonm, was then used in 
another detached leaf bioassay. Beet armyworm is approximately 500 fold less sensitive to B,tX HD-1 
protein than tobacco hornwonm. The difference in sensitivity of these two insects was determined using 
purified HD-1 protein in a diet incorporation assay (see below). Plants containing the wild-type gene 
(pM0N9921) showed only minimal protection against beet armyworm. whereas plants containing the 
modified gene showed almost complete protection and plants containing the fully synthetic gene were 
totally protected against beet armyworm damage. The results of these bioassays conflnm the levels of BJX 
HD-1 expression observed in the Westem analysis and demonstrates that the increased levels of B,tX HD- 
1 protein correlates with increased insecticidal activity. 

Table IX 



Protection of Tobacco Plants from Tobacco Hornworm and 




Beet Armyworm 




Geno 


Vector 


Tobacco 


Beet 


Description 




Hornworm 


Armyworm 






Damage* 


Damage" 


None 


None 


NL 


NL 


Wild type 


pMON9921 


0 


3 


Modified 


PMON5370 


0 


1 


Synthetic 


PMON5377 


0 


0 



* Extent of insect damage was rated: 0, no damage; t. slight; 2, moderate; 3. 
severe; or NL, no leaf left. 



The bioactivrty of the B,tk. HD-1 protein produced by these transgenic plants was further investigated 
to more accurately quantitate the relative activities. Leaf tissue from tobacco plants containing the wild-type, 
rrwdified and synthetic genes were ground in 100 mM sodium carbonate buffer. pH 10 at a 1:2 (wt:vol) ratio. 
Particulate material was removed by centrifugation. The supernatant was incorporated Into a synthetic diet 
similar to that described by Marrone et al. (1985). The diet medium was prepared the day of the test with 
the plant extract solutions Incorporated in place of the 20% water component One ml of the diet was 
aliquoted into 96 well plates. 

After the diet dried, one neonate tobacco budworm larva was added to each well. Sixteen insects were 
tested with each plant sample. The plants were incubated at 27* C. After seven days, the larvae from each 
treatment were combined and weighed on an analytical balance. The average weight per insect was 
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calculated and compared to a standard curve relating BJX protein concentrations to average larval weigtrt. 
Insect weight was inversely proportional (in a logarithmic manner) to the relative increase in BXk. protein 
concentration. The amount of B.tX HD-1 pn^tein, based on the extent of larval growth Inhibition was 
determined for two different plants containing each of the three genes. The specific activity (ng of BAX HD- 

5 1 per mg of plant protein) was determined for each plant. Plants containing the modified HD-1 gene 
(pMON5370) averaged approximately 1400 ng (1200 and 1600 ng) of BM. HOI per mg of plant extract 
protein. This value compares closely with the 1000 ng of BXk, HD-1 protein per mg of plant extract protein 
as determined by Western analysis (Table I)- S-f-'^- HD-1 concentrations for the plants containing the 
synthetic HD-1 gene averaged approximately 8200 ng (7200 and 9200 ng) of BJX HD-1 protein per mg of 

70 plant extract protein. This number compares well to the 5000 ng of HD-1 protein per mg of plant extract 
protein estimated by Western analysis. Likewise, plants containing the synthetic gene showed approxi- 
mately a six-fold higher specific activity than the corresponding plants containing the modified gene for 
these bioassays. In the Western analysis the ratio was approximately 10 fold, again both are in good 
agreement The level of BXk, protein in plants containing the wild-type HD-1 gene (pM0N9921) was too low 

IS to give a significant decrease in larval weight and hence was below a level that could be quantltated in this 
assay. In conclusion, the levels of B,tX HD-1 protein determined by both the bioassays and the Western 
analysis for these plants containing the modified and synthetic genes agree, which demonstrates that the 
BXk HD-1 protein produced by these plants is biologically active. 

The levels of mRisIA were determined in the plants containing the wild-type BXK HD-1 gene 

20 (pMON9921) and the modified gene (pMON5370) to establish whether the increased levels of protein 
production result from increased transcription or translation, mRNA from plants containing the synthetic 
gene could not be analyzed directly with the same DNA probe as used for the wild-type and rnodified 
genes because of the numerous changes made in the coding sequence. mRNA was isolated and hybridized 
with a single-stranded DNA probe homologous to approximately the 5 90 bp of the wild-type or modified 

25 gene coding sequences. The hybrids were digested with SI nuclease and the protected probe fragments 
analyzed by gel electrophoresis. Because the procedure used a large excess of probe and long hybridiza- 
tion time, the amount of protected probe is proportional to the amount of BAX mRNA present in the 
sample. Two plants expressing the modified gene (pMON5370) were found to produce up to ten-foid more 
RNA than a plant expressing the wild-type gene (pMON9921). 

30 The increased mRNA level from the modified gene is consistent with the result expected from the 
modifications introduced into this gene. However, this 10 fold increase in mRNA with the modified gene 
compared to the wild-type gene is in contrast to the 100 fold increase in BAX protein from these genes in 
tobacco plants. If the two mRNAs were equally well translated then a 10 fold increase in stable mRNA 
would be expected to yield a 10 fold increase in protein. The higher increase in protein indicates that the 

35 modified gene mRNA is translated at about a 10 fold higher efficiency than wild-type. Thus, about half of 
the total effect on gene expression can be explained by changes in mRNA levels and about half to changes 
in translational efficiency. This increase in translational efficiency is striking in that only about 9.5% of the 
codons have been changed in the modified gene: that is. this effect is clearly not due to wholesale codon 
usage changes. The increased translational efficiency could be due to changes in mRNA secondary 

40 stmcture that affect translation or to the removal of specific translational blockades due to specific codons 
that were changed. 

The increased expression seen with the synthetic HD-1 gene was also ^n with a synthetic HD-73 
gene in tobacco. BXK HD-73 was undetected in extracts of tobacco plants containing the wild-type 
truncated HD-73 gene (pMON5367). whereas BXk, HD-73 protein was easily detected in extracts from 
45 tobacco plants containing the synthetic HD-73 gene of Figure 4 (pMON5383). Approximately 1000 ng of 
BAX HD-73 protein was detected per mg of total soluble plant protein. 

As described in Example 3 above, the BAX HD-73 protein encoded in pMON5383 contains a small C- 
terminal extension of amino acids not encoded in the wild-type HD-73 protein. These extra amino acfcis had 
no effect on insect toxicity or on increased plant expression. A second synthetic HD-73 gene was 
50 constructed as described in Example 3 (Rgure 8) and used to transform tobacco (pMON5390). Analysis of 
plants containing pMON5390 showed that this gene was expressed at levels comparable to that of 
pMON5383 and that these plants had similar insectiddal efficacy. 

In tobacco plants the synthetic HD-1 gene was expressed at approximately a 5-fold higher level than 
the synthetic HD-73 gene. However, this synthetic HD-73 gene still was expressed at least 100-fold better 
55 than the wild-type HD-73 gene. The HD-73 protein is approximately &-fokj more toxic to many insect pests 
than the HD-1 protein, so both synthetic HD-1 and HD-73 genes provide approximately comparable 
insecticidal efficacy in tol>acco. 

The full length BAX, HD-73 genes described in Example 3 were also incorporated into the plant 
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transformation vector pMON893 so that they were expressed from the En 35S promoter. The syntheticAwild- 
type full length HD-73 gene of Figure 9 was incorporated into pMON893 to create pMON10505. The 
synthetic/modified full length HD-73 gene of Rgure 10 was incorporated Into pMON893 to create 
pMON1052e. The fully synthetic HD-73 gene of Figure 11 was incorporated into pMON893 to create 
pMONl0518. These vectors were used to obtain transformed tobacco plants, and the plants were analyzed 
for insectiddal efficacy and for B.tX HD-73 protein levels by Western blot or ELISA immunoassay. 

Tobacco plants containing all three of these full length BJX genes produced detectable B.tX protein 
and showed 100% mortality of tobacco hornworm. This result is surprising in light of previous reported 
attempts to express the full length B.tX genes In transgenic plants. Vaeck et al. (1987) reported that a full 
length B,tX beriiner gene similar to our HD-1 gene could not be detectably expressed in tobacco. Barton 
et al. (1987) reported a similar result for another full length gene from B.tX. HD-1 (the so called 4.5 kb 
gene), and further indicated that tobacco callus containing this gene became necrotic, indicating that the full 
length gene product was toxic to plant cells. Rschhoff et al. (1987) reported that the full length at/f. HD-1 
gene in tomato was poorly expressed compared to a truncated gene, and no plants that were fully toxic to 
tobacco hornworm could be recovered. All three of the above reports indicated much higher expression 
levels and recovery of toxic plants if the respective B.tX. genes were truncated. Adang et al. reported that 
the full length HD-73 gene yielded a few tobacco plants with some biological activity (none were highly 
toxic) against hornworm and barely detectable B.tX. protein. It was also noted by them that the major B.tX 
mRNA in these plants was a truncated 1.7 kb species that would not encode a functional toxin. This 
indicated improper expression of the gene in tobacco. In contrast to all of these reports, the three full length 
at/C HD-73 genes described above ail lead to relatively high levels of protein and high levels of Insect 
toxicity. 

B.tX. protein and mRr4A levels in tobacco plants are shown in Table X for these three vectors. As can 
be seen fn^m the table, the synthetic/wiid-type gene (pMON 10506) produpes B.tX. protein as about 0.01% 
of total soluble protein; the synthetic/modified gene produces B.tk. as about 0.02% of total soluble protein; 
and the fully synthetic gene produces B.tX. as about 0.2% of total soluble protein. B.tX. mRNA was 
analyzed in these plants by Northern blot analysis using the common 5' synthetic half of the genes as a 
probe. As shown in Table X, the Increased protein levels can largely be attributed to increased mRNA 
levels. Compared to the truncated modified and synthetic genes, this could indicate that the major 
contributors to increased translational efficiency are in the 5 half of the gene while the 3 half of the gene 
contains mostly detennlnants of mRNA st^ility. The increased protein levels also indicate that increasing 
the amount of the full length gene that is synthetic or modified increases B.tX. protein levels. Compared to 
the truncated synthetic B.tX. HD-73 genes (pMON5383 or pMON5390), the fully synthetic gene 
(pMONl0518) produces as much or slightly more B,tX. protein demonstrating that the full length genes are 
capable of being expressed at high levels in plants. These tobacco plants with high levels of full length HD- 
73 protein show no evidence of abnormality and are fully fertile. The B.tX. protein levels in these plants also 
produce the expected levels of insect toxicity based on feeding studies with beet armywonm or diet 
incorporation assays of plant extracts with tobacco budwomn. The B.t.K. protein detected by Western blot 
analysis in these tobacco plants often contains a varying amount of protein of about 80 kDa vvhich is 
apparently a proteolytic fragment of the full length protein. The C-terminal half of the full length protein is 
known to be proteolytically sensitive, and similar proteolytic fragments are seen from the full length gene in 
E. cofi and B,t itself. These fragments are fully insecticidal. The Northern analysis indicated that essentially 
all of the mRNA from these full length genes was of the expected full length size. There is no evidence of 
tmncated mRNAs that could give rise to the 80 kDa protein fragment. In addition, it is possible that the 
fragment is not present in intact plant cells and is merely due to proteolysis during extraction for 
immunoassay. 
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TabieX 



Full Length B.tX HD-73 Protein and mRNA Levels in Transgenic 




Tobacco Plants 




Gene description 


Vector 


B.t.k. protein 


Relative B.t.k. 






concentration 


mRNA level 


Synthetic/wild type 


pMON10506 


>100 


0.5 


Synthetic/modified 


pMON1G)526 


400 


1 . 


Fully synthetic 


PMON10518 


>2000 


40 



Thus, there is no serious impediment to producing high levels of BXK HD-73 protein in plants from 
synthetic genes, and this is expected to be true of other full length lepidopteran active genes such as B,tk. 
HD-1 or B.t entomocidus. The fully synthetic BXk, HD-1 gene of Example 3 has been assembled in plant 
transtonnation vectors such as pMON893. 

The fully synthetic gene in pMONl0518 was also utilized in another plant vector and analyzed in 
tobacco plants. Although the CaMV35S promoter is generally a high level constitutive promoter in most 
plant tissues, the expression level of genes driven the CaMV35S promoter is low in floral tissue relative to 
the levels seen in leaf tissue. Because the economically important targets damaged by some insects are 
the floral parts or derived from floral parts (e,g., cotton squares and bolls, tobacco buds, tomato buds and 
fruit), it may be advantageous to increase the expression of B.t protein in these tissues over that obtained 
with the CaMV35S promoter. 

The 35S promoter of Rgwort Mosaic Virus (FMV) is analogous to the CaMV35S promoter. This 
promoter has been isolated and engineered into a plant transformation vector analogous to pMON893. 
Relate to the CaMV promoter, the FMV 35S promoter is highly expressed in the floral tissue, while still 
providing similar high levels of gene expression in other tissues such as leaf. A plant transformation vector. 
pMON10517. was constructed in which the full length syntiietic B.tX HD-73 gene of Rgure 11 was driven 
by the FMV 35S promoter. This vector is identical to pMON10518 of Example 3 except that tiie FMV 
promoter is substituted for the CaMV promoter. Tobacco plants transformed with pMON10517 and 
pMON10518 were obtained and compared for expression of the BXk. protein by Western blot or ELISA 
immunoassay in leaf and floral tissue. This analysis showed tiiat pMON10517 containing the FMV promoter 
expressed the full lengtfi HD-73 protein at higher levels in floral tissue tinan pMONl0518 containing the 
CaMV promoter. Expression of the full lengtii BJX HD-73 protein from pMON10517 in leaf tissue is 
comparable to that seen with ihe most highly expressing plants containing pMON10518. However, when 
floral tissue was analyzed, tobacco plants containing pMON10518 tiiat had high levels of BJX protein in 
leaf tissue did not have detectable BJX protein in the flowers. On the other hand, flowers of tobacco plants 
containing pMON10517 had levels of B.tX protein nearly as high as tiie levels in leaves at approximately 
0.05% of total soluble protein. This analysis showed that the FMV promoter could be used to produce 
relatively high levels of B.tX protein in floral tissue compared to tiie CaMV promoter. 



b) Tomato. 

The wild-type, modified and syntiietic BJX HD-1 genes tested in tobacco were introdueed into other 
plants to demonstrate the broad utility of tiiis invention. Transgenic tomatoes were produced which contain 
ttiese three genes. Data show that the increased expression observed with the modified and synttietic gene 
in tobacco also extends to tomato. Whereas the BJX HD-1 protein is only barely detectable in plants 
containing the wild type HI3-1 gene (pMON992l). BJX HD-1 was readily detected and the levels 
determined for plants containing the modified (pMON5370) or synthetic (pM0N5377) genes. Expression 
levels for tiie plants containing tiie wild-type, modified and synthetic HD-1 genes were approximately 10, 
100 and 500 ng per mg of total plant extract see Table XI below). The increase in B.tk. HD-1 protein for the 
modified gene accounted for the majority of increase observed; 10 fold higher than the plants containing the 
wild-type gene, compared to only an additional five-fold increase for plants containing the synthetic gene. 
Again the site-directed changes made in the modified gene are tiie major contributors to tiie increased 
expression of BJX HD-1. 
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Table XI 



B.tk, HD-1 Expressfon in Transgenic Tomato Plants 


Gene 


Vector 


BXk. Protein* 


Fold Increase in 


Description 




Concentration 


B.tX Expression 


Wild type 


pMON9921 


10 


1 


Modified 


PMON5370 


100 


10 


Synthetic 


PMON5377 


500 


50 



• BXk. HD-1 protein concentrations are expressed in ng/mg of total soluble plant 
protein. Data for plants containing the wiid-type gene are estimates from mRNA 
levels and protein levels detemnined b ELISA. 



These differences in BJX HD-1 expression were confirmed with bioassays against tobacco hornworm 
and beet armyworm. Leaves from tomato plants containing each of these genes controlled tobacco 
hornworm damage and produced 100% mortality. With beet annnyworm. leaves from plants containing the 
wild-type HD-1 gene (pMON9921) showed significant damage, leaves from plants containing the modified 
gene (pMON5370) showed less damage and leaves from plants containing the synthetic gene (pM0N5377) 
were completely protected (see Table XII below). 

Table XII 



Protection of Tomato Plants from Tobacco Hornwonm and 




Beet Armyworm 




Gene 


Vector 


Tobacco 


Beet 


Description 




Hornworm 


Armyworm 






Damage* 


Damage* 


None 


None 


NL 


NL 


Wild type 


PMON9921 


0 


3 


Modified 


pMON5370 


0 


1 


Synthetic 


pM0N5377 


0 


0 



* Damage was rated as shown in Table IX. 



The generality of the synthetic gene approach was extended in tomato with a synthetic B.tX HD-73 
gene. 

In tomato, extracts from plants containing the wild-type truncated HD-73 gene (pMON5367) showed no 
detectable HD-73 protein. Extracts from plants containing the synthetic HD-73 gene {pMON5383) showed 
high levels of B,tk. HD-73 protein, approximately 2000 ng per mg of plant extract protein. These data 
clearly demonstrate that the changes made in the synthetic HD-73 gene lead to dramatic increases in the 
expression of the HD-73 protein in tomato as well as in tobacco 

In contrast to tobacco, the synthetic HD-73 gene in tomato is expressed at approximately 4-fold to 5- 
fold higher levels than the synthetic HD-1 gene. Because the HD-73 protein is about 5-fold more active than 
the HD-1 protein against many insect pests including Heliothis species, the increased expression of 
synthetic HD-73 compared to synthetic HD-1 corresponds to about a 25-fold increased insectlcidal efficacy 
in tomato. 

In order to detennine the mechanisms involved in the increased expression of modfied and synthetic 
BXk. HD-1 genes in tomato. SI nuclease analysis of mRNA levels from transformed tomato plants was 
performed. As indicated above, a similar analysis had been performed with tobacco plants, and this analysis 
showed that the modified gene produced up to 10-fold more mRNA than the wild-type gene. The analysis in 
tomato utilized a different DNA probe that allowed the analysis of wild-type {pMON992l), modified 
(pMON5370) and synthetic (pM0N5377) HD-1 genes with the same probe. This probe was derived from the 
5' untranslated region of the CaMV35S promoter in pM0N893 that was common to all three of these 
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vectors (pMON9921, pMON5370 and pMON5377). This S1 analysis indicated that B,tX mRIMA levels from 
the modified gene were 3 to 5 fold higher than for the wild-type gene, and that mRNA levels for the 
synthetic gene were about 2 to 3 fold higher than for the modified gene. Three independent transfomiants 
were analyzed for each gene. Compared to the fold increases in BJ.H, HD-1 protein from these genes In 

5 tomato shown In Table XI. these mRNA increases can explain about half of the total protein increase as was 
seen in tobacco for the wild-type and modified genes. For tomato the total mRNA increase from wild-type to 
synthetic Is about 6 to 15 fold compared to a protein increase of about 50 fold. This result is similar to that 
seen for tobacco in comparing the wild-type and modified genes, and it extends to the synthetic gene as 
well. That is» about half of the total fold increase in BJX protein from wild-type to modified genes can be 

70 explained by mRNA increases and about half to enhanced translationai. efficiency. The same is also true in 
comparing the modified gene to the synthetic gene. Although there is an additional increase In RNA levels, 
this mRNA increase can explain only about half of the total protein increase. 

The full length B.tX genes described above were also used to transform tomato plants and these plants 
were analyzed for BJX protein and insecticidal efficacy. The results of this analysis are shown in Table XML 

15 Plants containing the synthetic/wild-type gene {pMON10506) produce the BJX HD-73 protein at levels of 
about 0.01% of their total soluble protein. Plants containing the synthetic^modified gene (pMON10526) 
produce about 0.04% BJX protein, and plants containing the fully synthetic gene (pMONlOSiB) produce 
about 0^% B,tX. protein. These results are very similar to the tobacco plant results for the same genes. 
mRNA levels estimated by Northern blot analysis in tomato also increase in parallel with the protein level 

20 increase. As for tobacco with these three genes, most of the protein increase can be attributed to increased 
mRNA with a small component erf translationai efficiency increase indicated for the fully synthetic gene. The 
highest levels of full length B.tX protein (from pMON10518) are comparable to or just slightly lower than 
the highest levels observed for the truncated HD-73 genes (pM0N5383 and pMON5390). Tomato plants 
expressing these full length genes have the Insecticidal activity expected for the observed protein levels as 

25 determined by feeding assays with beet armyworm or by diet incorporation of plant extracts with tobacco 
hornworm. 

Table Xlli 



Full Length BJX, HD-73 Protein and mRNA Levels in Transgenic 




Tomato Plants 




Gene description 


Vector 


BJk. protein 


Relative BJX. 






concentration 


mRNA level 


Synthetic/wild type 


pMON10506 


100 


1 


Synthetic/modified 


pMON10526 


400 


2-4 


Fully synthetic 


pMON10518 


2000 


10 



40 



C) Cotton. 

45 The generaGty of the increased expression of BJX, HD-1 and BJX. HD-73 by use of the modified and 
synthetic genes was extended to cotton. Transgenic calli were produced which contain the wild type 
(pMON9921) and the synthetic HD-1 (pM0N5377) genes. Here again the BJX HD-1 protein product from 
calli containing the wild-type gene was not detected, whereas calli containing the synthetic HD-1 gene 
expressed the HD-1 protein at easily detectable levels. The HD-1 protein was produced at approximately 

50 1000 ng/mg of plant calli extract protein. Again, to ensure that the protein produced by the transgenic cotton 
calfi was biologically active and that the increased expression observed with the synthetic gene translated lo 
increased biological activity, extracts of cotton calli were made In similar manner as described for tofciacco 
plants, except that the calli was first dried between Whatman fitter paper to remove as much of the water as 
possible. The dried calli were then ground in liquid nitrogen and ground in 100 mM sodium carbonate 

55 buffer, pH 10. Approximately 0J5 ml aliquotes of this material was appfied to tomato leaves with a paint 
brush. After the leaf dried, five tobacco hornworm larvae were applied to each of two leaf samples. Leaves 
painted with extract from control calli were completely destroyed. Leaves painted with extract from calii 
containing the wild-type HD-1 gene (pMON9921) showed severe damage. Leaves painted with extract from 
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calli containing the synthetic HD-1 gene (pMON5377) showed no damage (see Table XIV below). 



Table XIV 

Protection aoainsl^ Tobaccn Hornworm bv Tomato Leaves 
Painted with Extracts Prepared from Cotton Calli 

CQntaininq a Controlr the Wilfl-Tvpft B.t.k. HD-I Gene. 

SynthetiG HD-1 Gene or Synthetic HD-73 Gene 



Gene Tobacco Hornworm 

Deacript ion VectQc Damage* 

Control Control NL 

Wild type HD-1 pMON9921 3 

Synthetic HD-1 pMON5377 0 

Synthetic HD-73 pMON5383 0 



Cotton calli were also produced containing another synthetic gene, a gene encoding B,tk. HD-73. The 
preparation of this gene is described in Exanr>ple 3. Calli containing the synthetic HD-73 gene produced the 
corresponding HD-73 protein at even higher leveis than the calli which contained the synthetic HD-1 gene. 
Extracts made from calli containing the HD-73 synthetic gene {pMON5383) showed complete control of 
tobacco hornwonm when painted onto tomato leaves as described above for extracts containing the HD-1 
protein. (See Table XIV). 

Transgenic cotton plants containing the synthetic B,tX HD-1 gene (pMON5377) or the synthetic BJA 
HD-73 gene (pMON5383) have also been examined. These plants produce the HD-1 or HD-73 proteins at 
levels comparable to that seen in cotton callus with the same genes and comparable to tomato and tobacco 
plants with these genes. For either synthetic truncated HD-l or HD-73 genes, cotton plants expressing S.f./f. 
protein at 1000 to 2000 ng/mg total protein (0.1% to 0.2%) were recovered at a high frequency. Insect 
feeding assays were performed with leaves from cotton plants expressing the synthetic HD-1 or HD-73 
genes. These leaves showed no damage (rating of 0) when chailenged with larvae of cabbage looper 
(Trichoplusia ni), and only slight damage when challenged with larvae of beet armyworm (Spodoptera 
exigua). Damage ratings are as defined in Table VIII above. This demonstrated that cotton plants as well as 
calli expressed the synthetic HD-1 or HD-73 genes at high levels and that those plants were protected from 
damage by Lepidopteran insect larvae. 

Transgenic cotton plants containing either the synthetic truncated HD-1 gene (pMON5377) or the 
synthetic truncated HD-73 gene (pMON5383) were also assessed for protection against cotton bollworm at 
the whole plant level in the greenhouse. This is a more realistic test of the ability of these plants to produce 
an agriculturally acceptable level of control. The cotton bollworm (Heiiothis zea) is a major pest of cotton 
that produces economic damage by destroying terminals, squares and bolls, and protection of these fruiting 
t)Odies as well as the leaf tissue will be important for effective insect control and adequate crop protection. 
To test the protection afforded to whole plants, R1 progeny of cotton plants expressing high levels of e'rther 
BXk. HD-1 (pM0N5377) or 6JK HD-73 (pMON5383) were assayed by applying 10-15 eggs of cotton 
bollworm per trail or square to the 20 uppermost squares or bolls on each plant. At least 1 2 plants were 
analyzed per treatment The hatch rate of the eggs was approximately 70%. This corresponds to very high 
insect pressure compared to numbers of larvae per plant seen under typical field conditions. Under these 
conditions 100% of the bolls on control cotton plants were destroyed by insect damage. For the 
transgenics, significant boll protection was observed. Plants containing pMON5377 (HD-1) had 70-75% of 
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the bolls survive the intense pressure of this assay. Plants containing pMON5383 (HD-73) had 80% to 90% 
boll protection. This is likely to be a consequence of the higher activity of HD-73 protein aganst cotton 
bollworm compared to HD-1 protein. In cases where the transgenic plants were damaged by the insects, 
the surviving larvae were delayed in their development by at least one instar. 

5 Therefore, the increased expression obtained with the modified and synthetic genes is not limited to 
any one crop; tobacco, tomato and cotton calli and cotton plants all showed drastic increases in BXk, 
expression when the plants/caili were produced containing the modified or synthetic genes. Likewise, the 
utility of changes made to produce the modified and synthetic BAX HD-1 gene is not limited to the HD-1 
gene. The synthetic HD-73 gene in all three species also showed drastic increases in expression. 

10 In summary, it has been demonstrated that: (1) the genetic changes made in the HD-1 modified gene 
lead to very significant increases in BA.k, HD-1 expression; (2) production ofa totally synthetic gene lead to 
a further five-fold increase in B.tX HD-1 expression; (3) the changes incorporated into the modified HD-1 
gene accounted for the majority of the increased BAX expression observed with the synthetic gene; (4) the 
increased expression was demonstrated in three different plants - tobacco plants, tomato plants and cotton 

IS calli and cotton plants; (5) the increased expression as observed by Western analysis also correlated with 
similar increases in bioactivity. showing that the B.tX, HD-1 proteins produced were comparably active; (6) 
when the method of the present invention used to design the synthetic HD-l gene was employed to design 
a synthetic HD-73 gene it also was expressed at much higher levels in tobacco, tomato and cotton than the 
wild-type equivalent gene with consequent increases in bioactivity; (7) a fully synthetic full length B,tX, gene 

20 was expressed at levels comparable to synthetic truncated genes. 



Example 5 - Synthetic B,t tenebrlonls Gene in Tobacco, Tomato and Potato 

25 Referring to Rgure 12, a synthetic gene encoding a Coleopteran active toxin is prepared by making the 
indicated changes in the wild-type gene of B.t. tenebrionis or de novo synthesis of the synthetic structural 
gene. The synthetic gene is inserted into an intermediate plant transformation vector such as pMON893: 
Plasmid pMONa93 containing the synthetic B.tX gene is then inserted Into a suitable disanmed Agrobac- 
terium strain such as A. tumefaciens AGO. 

30 

Transformation and Regeneration of Potato 

Sterile shoot cultures of Russet Burbank are maintained in vials containing 10 ml of PM medium 

35 (Murashige and Skoog (MS) inorganic salts. 30 g/l surcose, 0.17 g/l NaHaPO^HaO, 0.4 mg/1 thiamine-HCl, 
and 100 mg/l myo-inositol. solidified with 1 g/l Gelrite at pH 6.0). When shoots reached approximately 5 cm 
in length, stem intemode segments of 7-10 mm are excised and smeared at the cut ends with a disarmed 
Agrobacterium tumefaciens vector containing the synthetic B.t.t gene from a four day old plate culture. 
The stem explants are co-cultured for three days at 23' C on a sterile filter paper placed over 1.5 ml ot a 

40 tobacco cell feeder layer overiaid on 1/10 P medium {1/10 strength MS inorganic salts and organic addenda 
without casein as in Janret et al. (1980). 30 g/i surcose and 8.0 g/l agar). Following co-culture the explants 
are transferred to full strength P-1 medium for callus induction, composed of MS inorganic salts, organic 
additions as in Jarret et al. (1980) with the exception of casein, 3.0 mg/1 benzyladenine (BA). and 0.01 mg/l 
naphthaleneacetic acid (N/\A) (Janret et al., 1980). Carbenicillin (500 mg/l) is included to inhibit bacterial 

45 growth, and 100 mg/l kanamycin is added to select for transformed cells. After four weeks the explants are 
transfen-ed to medium of the same composition but with 0.3 mg/l gibberellic acid (GA3) replacing the BA 
and NAA (Jarret et al., 1981) to promote shoot fomnation. Shoots begin to develop approximately two weeks 
after transfer to shoot induction medium; these are excised and transfenred to vials of PM medium for 
rooting. Shoots are tested for kanamycin resistance conferred by the enzyme neomycin phosphotransferase 

50 11, by placing a section of the stem onto callus Induction medium containing MS organic and inorganic salts, 
30 g/l surcrose, 2^5 mg/l BA, 0.186 mg/l NAA. 10 mg/l GA3 (Webb, et al., 1983) and 200 mg/l kanamycin 
to select for transformed cells. 

The synthetic S.f.t gene described In figure 12. was placed into a plant expression vector as desctoed 
in example 5. The plasmid has the following characteristics; a synthetic Bgll! fragment having approximately 

55 1800 base pairs was inserted into pMON893 in such a manner that the enhanced 35S promoter would 
express the B.tX gene. This construct, pMON1982. was used to transform both tobacco and tomato. 
Tobacco plants, selected as 25. kanamycin resistant plants were screened with rabbit anti-fi.f.f. antibody. 
Cross-reactive material was detected at levels predicted to be suitable to cause mortality to CPB. These 
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target insects wiil not feed on tobacco, but the transgenic tobacco plants do demonstrate that ttie synthetic 
gene does iniprove expression of this protein to detectable levels. 

Tomato plants with the pMON1982 construct were determined .to produce BXt protein at levels 
insQCtiddal to CPB. In initial studies, the leaves of four plants (5190. 5225. 5328 and 5133) showed little or 
no damage when exposed to CPB larvae (damage rating of 0-1 on a scale of 0 to 4 with 4 as no leaf 
remaining). Under these conditions the control leaves were completely eaten, immunological analysis of 
these plants confirmed the presence of nnaterial cross-reactive with anti-S.tf. antibody. Levels of protein 
expression in these plants were estimated at aproximately 1 to 5 ng of B.tt protein in 50 ug of total 
extractable protein. A total of 17 tomato plants (17 of 65 tested) have been identified which demonstrate 
protection of leaf tissue from CPB (rating of 0 or 1) and show good insect mortality. 

Results similar to those seen in tobacco and tomato wifri pMON1982 were seen with pMONt984 in the 
same plant species. pMON1984 is identical to pMON1982 except that the synthetic protease inhibitor 
(CMTI) is fused upstream of the native proteolytic cleavage site. Levels of expression in tobacco were 
estimated to be similar to pMON1982, between 10-15 ng per 50ug of total soluble pnDtein. 

Tomato plants expressing pM0N1984 have been identified which protect the leaves from ingestion by 
CPB. The damage rating was 0 with 100% insect mortality. 

Potato was transformed as described in example 5 with a vector similar to pMON1982 containing the 
enhanced CaMV35S/synthetic BXt gene. Leaves of potato plants transformed with this vector, were 
screened by CPB insect bioassay. Of the 35 plants tested, leaves from 4 plants. 16a, I3c, I3d. and 23a 
were totally protected when challenged. Insect bioassay s with leaves from three other plants. 13©. la, and 
13b, recorded damage levels of 1 on a scale of 0 to 4 with 4 being total devestation of the leaf material. 
Immunological analysis confirmed the presence of BJX cross-reactive material in the leaf tissue. The level 
of B.tt protein in leaf tissue of plant 16a (damage rating of 0) was estimated at 20-50 ng of B.t.t. protein/50 
ug of total soluble protein. The levels of BXt protein seen in 16a tissue was consistent with its biological 
activity. Immunological analysis of 13e and 13b (tissue which scored 1 in damage rating) reveal less protein 
(5-10 ng/50 ug of total soluble protein) than in plant 16a. Cuttings of plant 16a were challenged with 50 to 
200 eggs of CPB in a whole plant assay. Under these conditions 16a showed no damage and 100% 
mortality of insects while control potato plants were heavily damaged. 



Example 6 - Synthetic g.r.^. P2 Protein Gene 

The P2 protein Is a distinct insecticidal protein produced by some strains of fl.f. including BXX HD-l. It 
is characterized by its activity against both lepidopteran and dipteran insects (Yamamoto and lizuka. 1983). 
Genes encoding the P2 protein have been isolated and characterized (Donovan et al., 1988). The P2 
proteins encoded by these genes are approximately 600 amino acids in length. These proteins share only 
limited homology with the lepidopteran specific PI type proteins, such as the BXk. HD-1 and HD-73 
proteins described in previous examples. 

The P2 proteins have substantia! activity, against a variety of lepidopteran larvae including cabbage 
looper. tobacco hornworm and tobacco budwomn. Because they are active against agronomically important 
insect pests, the P2 proteins are a desirable candidate in the production of insect tolerant transgenic plants 
either alone or in combination with the other BX. toxins described in the above examples. In some plants, 
expression of the P2 protein alone might be sufficient to provide protection against damaging insects. In 
addition, the P2 proteins might provide protection against agronomically important dipteran pests. In other 
cases, expression of P2 together with the BXX HD-1 or HD-73 protein might be preferred. The P2 proteins 
should provide at least an additive level of insecticidal activity when combined with the crystal protein toxin 
of BXX HD-1 or HD-73. and the combination may even provide a synergistic activity. Although the mode of 
action of the P2 protein is unknown, its distinct amino acid sequence suggests that it functions differently 
from the BXk. HD-1 and HD-73 type of proteins. Production of two insect tolerance proteins with different 
modes of action in the same plant would minimize the potential for development of insect reastance to O.f. 
proteins in plants. The lack of substantial DNA homology between P2 genes and the HD-1 and HD-73 
genes minimizes the potential for recombination between multiple insect tolerance genes in the plant 
chromosome. 

The genes encoding the P2 protein although distinct in sequence from the BXk. HD-1 and HD-73 
genes share many common features with these genes. In particular, the P2 protein genes have a high A + T 
content (65%). multiple potential polyadenylation signal sequences (26) and numerous ATTTA sequences 
(10). Because of its overall similarity to the poorly expressed wild-type B.tk. HD-1 and HD-73 genes, the 
same problems are expected in expression of the wiW-type P2 gene as were encountered with the previous 
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examples. Based on the above-described method for designing the synthetic B.t genes, a synthetic P2 
gene has been designed which gene should be expressed at adequate levels for protection in plants. A 
comparision of the wild-type and synthetic P2 genes is shown in Rgure 13. 

5 

Example 7 - Synthetic B,t Entomocidus Gene 

The B.t entomocidus ("Btent") protein is a distinct insecticidal protein produced by some strains of 
B.t bacteria It is characterized by its high level of activity against some lepidopterans that are relatively 

10 insensitive to B,tk. HD-1 and HD-73 such as Spodoptera species including beet armyworm (Visser et al., 
1988). Genes encoding the Btent protein have been isolated and characterized (Honee et al, 1988). The 
Stent proteins encoded by these genes are approximately the same length as B.tX HD-1 and HD-73. 
These proteins share only 68% amino acid homology with the B,tX HE>-1 and HD-73 proteins. It is likely 
that only the N-terminal half of the Btent protein is required for insecticidal activity as is the case for HD-1 

75 and HD-73. Over the first 625 amino acids. Btent shares only 38% amino acid homology with HD-1 and HD- 
73. 

Because of their higher activity against Spodoptera species that are relatively insensitive to HD-1 and 
HD-73, the Btent proteins are a desirable candidate for the production of insect tolerant transgenic plants 
either alone or in combination with the other B.t toxins described in the abowe examples. In some plants 

20 production of Btent alone might be sufficient to control the agronomically important pests. In other plants, 
the production of two distinct insect tolerance proteins would provide protection against a wider array of 
insects. Against those insects where both proteins are active, the combination of the BJX HD-1 or HD-73 
type protein plus the Btent protein should provide at least additive insecticidal efficacy, and may even 
provide a synergistic activity. In addition, because of its distinct amino acid sequence, the Btent protein 

25 may have a different mode of action than HD-1 or HD-73. Production of two insecticidal proteins in the 
same plant with different modes of action would minimize the potential for development of insect resistance 
to BX proteins in plants. The relative lack of DMA sequence homology with the B.tk. type genes minimizes 
the potential for recombination between multiple insect tolerance genes in the plant chromosome. 

The genes encoding the Btent protein although distinct in sequence from the H.t/f. HD-1 and HD-73 

30 genes share many common features with these genes. In particular, the Btent protein genes have a high 
A+T content (62%), multiple potential polyadenylatlon signal sequences (39 in the full length coding 
sequence and 27 in the first 1875 nucleotides that Is likely to encode the active toxic fragment) and 
numerous ATTTA sequences (16 in the full length coding sequence and 12 in the first 1875 nucleotides). 
Because of its overall similarity to the poorly expressed wild type B.tk. HD-1 and HD-73 genes, the wild- 

35 type Btent genes are expected to exhibit similar problems in expression as were encountered with the wild- 
type HD-1 and HD-73 genes. Based on the above-described method used for designing the other synthetic 
B.t genes, a synthetic Btent gene has been designed which gene should be expressed at adequate levels 
for protection in plants. A comparision of the wild type and synthetic Btent genes is shown in Rgure 14. 

40 

Example 8 - Synthetic B.tk. Genes for Expression in Corn 

High level expression of heterologous genes in com cells has been shown to be enhanced by the 
presence of a corn gene intran (Callis et al.. 1987). Typically these introns have been located in the 5^ 
45 untranslated region of the chimeric gene. It has been shown that the CaMVSSS promoter and the NOS 3 
end function effidenlly in the expression of heterologous genes in com cells (Fromm et ai., 1986). 

Referring to Figure 15, a plant expression cassette vector (pM0N744) was constiucted that contains 
these sequences. Specifically the expression cassette contains the enhanced CaMV 35S promoter followed 
by intron 1 of tiie corn Adhi gene (Callis et al.. 1987). This is followed by a multilinker cloning site for 
50 insertion of coding sequences; this multilinker contains a Bglll site among others. Following the multilinker is 
the NOS 3' end. pM0N744 also contains the selectable marker gene 35S/NPTI1/NOS 3' for kanamydn 
selection of transgenic com cells. In addition. pM0N744 has an E. co// origin of replication and an ampicillin 
resistance gene for selection of the plasmid in E. coll. 

Rve BJ.k. cocfing sequences described in the previous examples were inserted into the Bglll site of 
55 pM0N744 for com cell expression of B.tk, The coding sequences inserted and resulting vectors were: 

1. Wi\6 type BAX HD-1 from pMON9921 to make pMON8652. 

2. Modified BAX HD-1 from pMON5370 to make pMON8642. 

3. Synthetic BJX HD-1 from pM0N5377 to make pM0N8643. 
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4. Synthetic BJX HD-73 from pMON5390 to make pM0N8644. 

5. Synthetic full length BJX HD-73 from pMON10518to make pMON10902. 

pMON8652 (wild-type B.tk. HD-1) was used to transfomn com cell protoplasts and stably transformed 
kanamycin resistant callus was Isolated. BJX mRNA in the com cells was analyzed by nuclease 31 
5 protection and found to be present at a level comparable to that seen with the same wild-type coding 
sequence {pMON9921) in transgenic tomato plants. 

pMON8652 and pMON8642 (modified HD-1) were used to transform corn cell protoplasts in a transient 
expression system. The level of BXk. mRNA was analyzed by nuclease 81 protection. The modified HD-1 
gave rise to a several fold Increase in B,tX mRNA compared to the wild-type coding sequence in the 
10 transiently transformed com cells. This indicated that the modifications Introduced into the BJX HD-1 gene 
are capable of enhancing BJX expression in monocot cells as was demonstrated for dicot plants and cells. 

pMON8642 (modified HD-1) and pMON8643 (synthetic HD-1) were used to transform Black Mexican 
Sweet (BMS) corn cell protoplasts by PEG-mediated DNA uptake, and stably transformed com callus was 
selected by growth on kanamycin containing plant growth medium. Individual callus colonies that were 
75 derived from single transformed cells were Isolated and propagated separately on kanamycin containing 
medium. 

To assess the expression of the BJ.k, genes in these cells, callus samples were tested for insect 
toxicity by bioassay against tobacco homworm larvae. For each vector, 96 callus lines were tested by 
bioassay. Portions of each callus were placed on sterile water agar plates, and five neonate tobacco 

20 hornworm larvae were added and allowed to feed for 4 days. For pM0N8643. 100% of the larvae died after 
feeding on 15 of the 96 call! and these call! showed little feeding damage. For pMON8642, only 1 of the 96 
calli was toxic to the larvae. This showed that the BJX gene was being expressed in these samples at 
insecticidal levels. The observation that significantly more call! containing pMON8643 were toxic than for 
pMON8642 showed that significantly higher levels of expression were obtained when the synthetic HD-1 

25 coding sequence was contained in com cells than when the modified HD-1 coding sequence was used, 
similar to the previous examples with dteot plants. A semiquantitative immunoassay showed that the 
pMON8S43 toxic samples had significantly higher BJX protein levels than the pMON8642 toxic sample. 

The 16 callus samples that were toxic to tobacco homworm were also tested for activity against 
European corn borer. European com borer is approximately 40-fold less sensitive to the HD-1 gene product 

30 than is tobacco hornworm. Larvae of European corn borer were applied to the callus samples and allowed 
to feed for 4 days. Two of the 16 calli tested, both of which contained pM0N8643 (synthetic HD-1). were 
toxic to European com borer larvae. 

To assess the expression of the BJX genes in differentiated corn tissue, another method of DNA 
delivery was used. Young leaves were excised from com plants, and DNA samples were delivered into the 

35 leaf tissue by microprojectile bombardment. In this system, the DNA on the microprojectiles is transiently 
expressed in the leaf cells after bombardment. Three DNA samples were used, and each DNA was tested 
In triplicate. 

1 . pM0N744, the com expression vector with no BJX gene. 

2. pMON8643 (synthetic HD-1 ). 

40 3. pMON752, a corn expression vector for the GUS gene, no BJM. gene. 

The leaves were incubated at room temperature for 24 hours. The pM0N752 samples were stained with 
a substrate that allows visual detection of the GUS gene product. This analysis showed that over one 
hundred spots in each sample were expressing the GUS product and the the triplicate samples showed 
very similar levels of GUS expression. For the pMOH744 and pMON8643 samples 5 larvae of tobacco 

4s homworm were added to each leaf and allowed to feed for 48 hours. All three samples bombarded with 
pM0N744 showed extensive feeding damage and no larval mortality. All three samples bombarded with 
pMON8643 showed no evidence of feeding damage and 100% larval mortality. The samples were also 
assayed for the presence of BJX protein by a qualitative immunoassay. All of the pM0N8643 samples had 
detectable BJX protein. These results demonstrated that the the synthetic BJX gene was expressed in 

50 differentiated com plant tissue at insecticidal levels. 



Example 9 - Synthetic Potato Leaf Roll Virus Coat Protein Gene 

Expression in plants of the coat protein genes from a variety of plant viruses has proven to be an 
effective method of engineering resistance to these viruses, in order to achieve virus resistance, it is 
important to express the viral coat protein at an effective level. For many plant virus coat protein genes, this 
has not proved to be a problem. However, for the coat protein gene from potato leaf roll virus (PLRV), 
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expression of the coat protein has been observed to be iow relative to other coat protein genes, and this 
lower leve! of protein has not led to optimal resistance to PLRV. 

The gene for PLRV coat protein is shown in Rgure 16. Referring to Rgure 16. the upper fine of 
sequence shows the gene as it was originally engineered for plant expression in vector pMON893. The 

5 gene was contained on a 749 nucleotide Bglll-EcoRI fragment with the coding sequence contained between 
nucleotides 20 and 643. This fragment also contained 19 nucleotides of 5 noncoding sequence and 104 
nucleotides of 3 noncoding sequence. This PLRV coat protein gene was relatively poorly expressed in 
plants compared to other viral coat protein genes. 

A synthetic gene was designed to improve plant expression of the PLRV coat protein. Referring again 

70 to Figure 16. the changes made in the synthetic PLRV gene are shown in the lower line. This gene was 
designed to encode exactly the same protein as the naturally occurring gene. Note that the beginning of the 
synthetic gene is at nucleotide 14 and the end of the sequence is at nucleotide 654. The coding sequence 
for the synthetic gene is from nucleotide 20 to 643 of the figure. The changes indicated just upstream and 
downstream of these endpoints serve only to Introduce convenient restriction sites just outside the coding 

15 sequence. Thus the size of the synthetic gene is 641 nucleotides which Is smaller than the naturally 
occurring gene. The synthetic gene is smaller because substantially all of the noncoding sequence at both 
the 5' and 3' ends, except tor segments encoding the Bglll and EcoRl restriction sites has been removed. 

The synthetic gene differs from the naturally occurring gene in two main respects. First 41 individual 
codons within the coding sequence have been changed to remove nearly all codons for a given amino acid 

20 that constitute less than about 15% of the codons for that amino add in a survey of dicot plant genes. 
Second, the 5' and 3' noncoding sequences of the original gene have been removed. Although not strictly 
conforming to the algorithm described in Rgure 1 . a few of the codon changes and especially the removal 
of the long 3' noncoding region is consistent with this algorithm. 

The original PLRV sequence contains two potential plant polyadenylation signals (AACCAA and 

25 AAGCAT) and both of the these occur in the 3' noncoding sequence that has been removed in the synthetic 
g^e. The original PLRV gene also contains on ATTTA sequence. This is also contained in the 3 
noncoding sequence, and is in the midst of the longest stretch of uninterrupted A + T in the gene (a stretch 
of 7 A+T nucleotides). This sequence was removed in the synthetic gene. Thus, sequences that the 
algorithm of Rgure 1 targets for change have been changed in the synthetic PLRV coat protein gene by 

30 removal of the 3' noncoding segment. Within the coding sequence, codon changes were also made to 
remove three other regions of sequence described above. In particular, two regions of 5 consecutive A+T 
and one region of 5 consecutive G + C within the coding sequence have been removed in the synthetic 
gene. 

The synthetic PLRV coat protein gene is cloned in a plant transformation vector such as pMON893 and 
35 used to transform potato plants as described above. These plants express the PLRV coat protein at higher 
levels than achieved with the naturally occurring gene, and these plants exhibit Increased resistance to 
infection by PLRV. 

40 Example 1 0 - Expression of Synthetic B,t Genes with RUBiSCO Small Subunit Promoters and Chloroplast 
Transit Peptides 

The genes in plants encoding the small subunit of RUBISCO (SSU) are often highly expressed, light 
regulated and sometimes show tissue specificity. These expression properties are largely due to the 

45 promoter sequences of these genes. It has been possible to use SSU promoters to express heterologous 
genes in transfomned plants. Typically a plant will contain multiple SSU genes, and the expression levels 
and tissue specificity of different SSU genes win be different The SSU proteins are encoded in the nucleus 
and synthesized in the cytoplasm as precursors that contain an N-terminal extension known as the 
chloroplast transit peptide (CTP). The CTP directs the precursor to the chloroplast and promotes the uptake 

60 of the SSU protein into the chloroplast. In this process, the CTP is cleaved from the SSU protein. These 
CTP sequences have been used to direct heterologous proteins into chloroplasts of transformed plants. 

The SSU promoters might have several advantages for expression of B,tk. genes in plants. Some SSU 
promoters are very highly expressed and could give rise to expression levels as high or higher than those 
observed with the CaMV35S promoter. The tissue distribution of expression from SSU promoters is different 

55 from that of the CaMV35S promoter, so for control of some insect pests, it may be advantageous to direct 
the expression of B.tX to ttiose cells in which SSU is most highly expressed. For example, although 
relatively constitutive, in the leaf the CaMV35S promoter is more highly expressed in vascular tissue than in 
some other parts of the leaf, while most SSU promoters are most highly expressed in the mesophyll cells of 
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the leaf. Some SSU promoters also are more highly tissue specific, so it could be possible to utilize a 
specific SSU promoter to express B.tX in only a subset of plant tissues, if for example B.t expression in 
certain cells was found to be deleterious to those cells. For example, for control of Colorado potato beetle In 
potato, it may be advantageous to use SSU promoters to direct B.tt expression to the leaves but not to the 
5 edible tubers. 

Utilizing SSU CTP sequences to localize B.t. proteins to the chloroplast might also be advantageous. 
Localization of the B.t. to the chloroplast could protect the protein from proteases found In the cytoplasm. 
This could stabilize the B.t protein and lead to higher levels of accumulation of active protein. S.f. genes 
containing the CTP could be used in combination with the SSU promoter or with other promoters such as 
70 CalV)V35S. 

A variety of plant transformation vectors were constructed for the expression of BJX genes utilizing 
SSU promoters and SSU CTPs. The promoters and CTPs utilized were from the petunia SSUIIa gene 
described by Turner et al. (1986) and from the Arabidopsis atsIA gene (an SSU gene) described by 
Krebbers et al. (1988) and by Elionor et al. (1989). The petunia SSUIIa promoter was contained on a DNA 

75 fragment that extended approximately 800 bp upstream of the SSU coding sequence. The Arabidopsis 
atsIA promoter was contained on a DNA fragment that extended approximately 1.8 kb upstream of the SSU 
coding sequence. At the upstream end convenient sites from the multilinker of pUC18 were used to move 
these promoters into plant transformation vectors such as pMON893. These promoter fragments extended 
to the start of the SSU coding sequence at which point an Ncol restriction site was engineered to allow 

20 insertion of the B,t coding sequence, replacing the SSU coding sequence. 

When SSU promoters were used in combination with their CTP. the DNA fragments extended through 
the coding sequence of the CTP and a small portion of the mature SSU coding sequence at which point an 
Ncol restriction site was engineered by standard techniques to allow the in frame fusfon of B,t. coding 
sequences with the CTP. In particular, for the petunia SSUIIa CTP. at coding sequences were fused to 

25 the SSU sequence after amino add 8 of the mature SSU sequence at which point the Ncol site was placed. 
The 8 amino acids of mature SSU sequence were included because preliminary in vitro chloroplast uptake 
experiments indicated that uptake was of BJX was observed only if this segment of mature SSU was 
included. For the Arabidopsis atsIA CTP, the complete CTP was included plus 24 amino acids of mature 
SSU sequence plus the sequence gly-gly-arg-val-asn-cys-met-gln-ala-met. temninating in an Ncol site for 

30 B.t fusion. This short sequence reiterates the native SSU CTP cleavage site (between the cys and met) 
plus a short segment unrounding the cleavage site. This sequence was included in order to insure proper 
uptake into chloroplasts. S.f. coding sequences were fused to this atsIA CTP after the met codon. In vitro 
uptake experiments with this CTP construction and other (non-S.t) coding sequences showed that this CTP 
did target proteins to the chloroplast. 

35 When CTPs were used in combination with the CaMV 35S promoter, the same CTP segments were 
used. They were excised just upstream of the ATG start sites of the CTP by engineering of Bglll sites, and 
placed downstream of the CaMV35S promoter in pMON893, as Bglll to Ncol fragments. B.t coding 
sequences were fused as described above. 

The wild type BXk. HD-1 coding sequence of pMON9921 (see Figure 1) was fused to the atsIA 

40 promoter to make pMON1925 or the atsIA promoter plus CTP to make prviON1921. These vectors were 
used to transform tobacco plants, and the plants were screened for activity against tobacco hornworm. No 
toxic plants were recovered. This is surprising in light of the fact that toxrc plants could be recovered, albeit 
at a low frequency, after transfomnation with pMON9921 In which the B.tX coding sequence was expressed 
from the enhanced CaMV35S. promoter in pM0Na93, and in light of the fact that Elionor et al. (1989) report 

45 that the atsIA promoter itself is comparable in strength to the CaIVIV35S promoter and approximately 10- 
fold stronger when the CTP sequence is included. At least for the wild-type ftf./f. HD-1 coding sequence, 
this does not appear to be the case. 

A variety of plant transformation vectors were constructed utilizing either the truncated synthetic HD-73 
coding sequence of Figure 4 or the full length B.tk. HD-73 coding sequence of Rgure 11. These are listed 

50 in the table below. 
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Table XV 



Gene Constructs with CTPs 


Vector 


Promoter 


CTP 


BAX HD-73 








Coding Sequence 


PMON10806 


En 35S 


atslA 


truncated 


PMON10814 


En35S 


SSU11a 


full length 


pMON10811 


SSUIIa 


SSUIIa 


truncated 


pMON10819 


SSUIla 


none 


truncated 


pMON10815 


atsIA 


none 


truncated 


pMON10817 


atsIA 


atsIA 


truncated 


pMON10821 


En35S 


atsIA 


tmncated 


pMON10822 


En35S 


atsIA 


full length 


PMON10838 


SSUIIa 


SSUIIa 


full length 


pMON10839 


ats1A 


atsIA 


full length 



All of the above vectors were used to transform tobacco plants. For all of the vectors containing 
truncated B.tX genes, leaf tissue from these plants has been analyzed for toxicity to insects and BJX 
protein levels by immunoassay. pMONlOBOS, 10811, 10819 and 10821 produce levels of BJX protein 
comparable to pl\/!ON5383 and pMON5390 which contain synthetic BXk. HD-73 coding sequences driven 
by the En 358 promoter itself with no CTP. These plants also have the insecticidal activity expected for the 
at/f. protein levels detected. For pMON10815 and pMON10817 (containing the atsIA promoter), the level 
of B,tX protein is about 5-fold higher than that found in plants containing pMON53B3 or 5390. These plants 
also have higher insecticidal activity. Plants containing 10815 and 10817 contain up to 1% of their total 
soluble leaf protein as B.tX, HD-73. TTiis is the highest level of B.tX protein yet obtained with any of the 
synthetic genes. 

This result is surprising in two respects, Rrst, as noted above, the wild type coding sequences fused to 
the atsi A promoter and CTP did not show any evidence of higher levels of expression than for En 35S, and 
in fact had lower expression based on the absence of any insecticidal plants. Second, Elionor et al. (1989) 
show that for two other genes, the atsi A CTP can increase expression from the atsi A promoter by about 
10-fold. For the synthetic B.tX HD-73 gene, there is no consistent increase seen by including the CTP over 
and above that seen for the atsi A promoter alone. 

Tobacco plants containing the full length synthetic HD-73 fused to the SSU11A CTP and driven by the 
En 35S promoter produced levels of B.tX protein and insecticidal activity comparable to pMON1518 which 
contains does not include the CTP. In addition, for pMON10518 the B.tk. protein extracted from plants was 
observed by gel electrophoresis to contain multiple forms less than full length, apparently due the cleavage 
of the C-terminal portion (not required for toxicity) in the cytoplasm. For pMON10814. the majority of the 
protein appeared to be intact full length indicating that the protein has been stabilized from proteolysis by 
targeting to the chloroplast 

Example 11 - Targeting of B.t Proteins to the Extracellular Space or Vacuole through the Use of Signal 
Peptides 

The E.f. proteins produced from the synthetic genes, described here are localized to the cytoplasm of 
the plant cell, and this cytoplasmic localization results in plants that are insectlcidally effective. It may be 
advantageous for some purposes to direct the B.t proteins to other compartments of the plant cell. 
Localizing B.t proteins in compartments other than the cytoplasm may result in less exposure of the B.t 
proteins to cytoplasmic proteases leading to greater accumulation of the protein yielding enhanced 
insecticidal activity. Extracellular localization could lead to more efficient exposure of certain insects to the 
B.t proteins leading to greater efficacy. If a B.t protein were found to be deleterious to plant cell function, 
then localization to a noncytoplasmic compartment could protect these cells from the protein. 

In plants as well as other eucaryotes, proteins that are destined to be localized either extracellularly or 
in several specific compartments are typically synthesized with an N-tennainal amino acid extension known 
as the signal peptide. This signal peptide directs the protein to enter the compartmentalization pathway, and 
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it is typically cleaved from the mature protein as an early step in compartmentallzation. For an extracellular 
protein, the secretory pathway typically involves cotranslational insertion into the endoplasmic reticulum with 
cleavage of the signal peptide occuring at this stage. The mature protein then passes thru the Golgi body 
into vesicles that fuse with the plasma membrane thus releasing the protein into the extracellular space. 

5 Proteins destined for other compartments follow a similar pathway. For example, proteins that are destined 
for the endoplasmic reticulum or the Golgi body follow this scheme, but they are specifically retained in the 
appropriate compartment. In plants, some proteins are also targeted to the vacuole, another membrane 
txDund compartment In the cytoplasam of many plant cells. Vacuole targeted proteins diverge from the 
above pathway at the Golgi body where they enter vesicles that fuse with the vacuole. 

70 A common feature of this protein targeting is the signal peptide that initiates the compartmentalization 
process. Fusing a signal peptide to a protein will in many cases lead to the targeting of that protein to the 
endoplasmic reticulum. The efficiency of this step may depend on the sequence of the mature protein itself 
as well. The signals that direct a protein to a specific compartment rather than to the extracellular space are 
not as clearly defined. It appears that many of the signals that direct the protein to specific compartments 

;s are contained within the amino acid sequence of the mature protein. This has been shown for some vacuole 
targeted proteins, but it Is not yet possible to define these sequences precisely. It appears that secretion 
Into the extracellular space is the "default" pathway for a protein that contains a signal sequence but no 
other compartmentalization signals. Thus, a strategy to direct B.t proteins out of the cytoplasm is to fuse 
the genes for synthetic B.t genes to DNA sequences encoding known plant signal peptides. These fusion 

20 genes will give rise to fi.f. proteins that enter the secretory pathway, and lead to extracellualar secretion or 
targeting to the vacuole or other compartments. 

Signal sequences for several plant genes have been described. One such sequence is for the tobacco 
pathogenesis related protein PRlb described by Cornelissen et al. The PR1b protein is normally localized 
to the extracellular space. Another type of signal peptide is contained on seed storage proteins of legumes. 

25 These proteins are localized to the protein body of seeds, which is a vacuole like compartment found in 
seeds. A signal peptide DNA sequence for the beta subunit of the 7S storage protein of common bean 
(Phaseolus vulgaris), PvuB has been described by Doyle et al. Based on the published these pi±>lished 
sequences, genes were synthesized by chemical synthesis of oligonucleotides that encoded the signal 
peptides for PRlb and F>vuB. The synthetic genes for these signal peptides corresponded exactly to the 

30 reported DNA sequences. Just upstream of the translational intiation codon of each signal peptide a BamHI 
and Bglll site were inserted with the BamHI site at the 5' end. This allowed the insertion of the signal 
peptide encoding segments into the Bglll site of pMON893 for expression from the En 35S promoter. In 
some cases to achieve secretion or compartmentalization of heterologous proteins, it has proved necessary 
to include some amino acid sequence beyond the normal cleavage site of the signal peptide. This may be 

35 necessary to insure proper cleavage of the signal peptide. For PRlb the synthetic DNA sequence also 
included the first 10 amino acids of mature PR1b. For PvuB the synthetic DNA sequence included the first 
13 amino acids of mature PvuB. Both synthetic signal peptide encoding segments ended with Ncol sites to 
allow fusion in frame to the methionine initiation codon of the synthetic B.t genes. 

Four vectors encoding synthetic. BAX HD-73 genes were constructed containing these signal peptides. 

40 The synthetic truncated HD-73 gene from pMON5383 was fused with the signal peptide sequence ot PvuB 
and incorporated into pMON893 to create pMON10827. The synthetic truncated HD-73 gen© from 
pMON5383 was also fused with the signal peptide sequence of PRlb to create pMON10824. The full length 
synthetic HD-73 gene from pMON10518 was fused with the signal peptide sequence of PvuB and 
incorporated into pMON893 to create pMONl0828. The full length synthetic HD-73 gene from pMON10518 

45 was also fused with the signal peptide sequence of PR1b and incorporated into pMON893 to create 
PMON10825. 

These vectors were used to transform tobacco plants and the plants wore assayed for expression of the 
B,tk. protein by Western blot analysis and for insecticidal efficacy. pMON10824 and pMON10827 produced 
amounts of B.tk. protein in leaf comparable to the truncated HD-73 vectors, pMON5383 and pMON5390- 
50 pMON10825 and pMON10828 produced full length B,tk. protein in amounts comparable to pWlON105l8- In 
all cases, the plants were insecticidally active against tobacco hornworm. 
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Claims 

70 1. In a method for improving the expression of a heterologous gene in plants by modifying the staictural 
coding sequence of said gene, the improvement which comprises reducing the occurrence of polyadenyla- 
tion signals selected from the group consisting of AATAAA, AATAAT. AACCAA. ATATAA, AATCAA. 
ATACTA. ATAAAA. ATGAAA. AAGCAT. ATTAAT. ATACAT, AAAATA. ATTAAA, AATTAA. AATACA and 
CATAAA. 

75 2. The method of Claim 1 further comprising the improvement of reducing the occurrence of ATTTA 
sequences within the structural coding sequence. 

3. A method for modifying a wild-type structural gene sequence which encodes an Insectlcldal protein 
of Bacillus thuringiensis to enhance the expression of said protein in plants which comprises: 

a) removing polyadenylation signals contained in said wild-type gene while retaining a sequence 
20 which encodes said protein; and 

b) removing ATTTA sequences contained in said wild-type gene while retaining a sequence which 
encodes said protein. 

4. A method of Claim 3 further comprising the removal of self-complementary sequences and 
replacement of such sequences with nonself-complementary DNA comprising plant preferred codons while 

25 retaining a structural gene sequence encoding said protein. 

5. A method of Claim 4 further comprising the use of plant preferred sequences in the removal of the 
polyadenylation signals and ATTTA sequences. 

6. A method of Claim 3 in which the polyadenylation signals are selected from the group consisting of 
AATAAA. AATAAT, AACCAA. ATATAA, AATCAA. ATACTA. ATAAAA. ATGAAA. AAGCAT. ATTAAT. AT- 

30 ACAT. AAAATA, ATTAAA. AATTAA. AATACA and CATAAA. 

7. A method of Claim 4 in which the polyadenylation s^nals are selected from the group consisting of 
AATAAA. AATAAT. AACCAA. ATATAA, AATCAA. ATACTA ATAAAA. ATGAAA. AAGCAT. ATTAAT. AT- 
ACAT, AAAATA. ATTAAA. AATTAA. AATACA and CATAAA. 

8. A method of Claim 5 in which the polyadenylation signals are selected from the group consisting of 
35 AATAAA. AATAAT. AACCAA. ATATAA, AATCAA. ATACTA. ATAAAA. ATGAAA. AAGCAT. ATTAAT. AT- 
ACAT. AAAATA ATTAAA, AATTAA. AATACA and CATAAA. 

9. A method for modifying a wild-type stnjctural gene sequence which encodes an insecticidai protein 
of Bacillus thuringiensis to enhance the expression of said protein in plants which comprises: 

a) identifying regions within said sequence with greater than four consecutive adenine or thymine 
40 nucleotides; 

b) modifying the regions of step (a) which have two or more polyadenylation signals within a ten base 
sequence to remove said signals while maintaining a gene sequence which encodes said protein; and 

c) modifying the 15-30 base regions surrounding the regions of step (a) to remove major plant 
polyadenylation signals, consecutive sequences containing more than one minor polyadenylation signal and 

45 consecutive sequences containing more than one ATTTA sequence while maintaining a gene sequence 
which encodes said protein. 

10. A method of Claim 9 in which the major plant polyadenylation signals are selected from the group 
consisting of AATAAA and AATAAT. 

11. A method of Claim 10 in which the polyadenylation signals are selected from the group consisting 
50 of AATAAA. AATAAT, AACCAA. ATATAA. AATCAA. ATACTA. ATAAAA. ATGAAA, AAGCAT. ATTAAT. 

ATACAT. AAAATA ATTAAA. AATTAA. AATACA and CATAAA 

12. A method of Claim 11 further comprising the use of plant preferred sequences in the removal of 
polyadenylation signals and ATTTA sequences. 

13. A structural gene which encodes an insecticidai protein of Bacillus thuringiensis. said gene being 
55 substantially devoid of polyadenylation signals and ATTTA sequences. 

14. A stnjctural gene of Claim 13 which is substantially devoid of polyadenylation signals selected from 
the group consisting of AATAAA. AATAAT. AACCAA. ATATAA. AATCAA. ATACTA. ATAAAA. ATGAAA. 
AAGCAT. ATTAAT. ATACAT. AAAATA. ATTAAA. AATTAA. AATACA and CATAAA. 
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15. A structural gene of Claim 13 which encodes an in^cticidal protein of B,tX HD-1 having the 
sequence: 

1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATTTCCT 40 
. • • • 

4 1 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 8 0 

• • • • 

8 1 TGCTGGATTTGTGTTAGG ACTAGTTGATATT ATCTGGGGA 120 

. • • • 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 1 60 

• • • • 

161 TTGAACAGCTCATCAACCAGAGAATCGAAGAGTTCGCTAG 200 

t • ♦ • 

201 G AATCAAGCC ATTTCTAGATTAGAAGGACTAAGCAATCTT 240 

• • • • 

241 T ATC AAATTTACGCAGAATCTTTT AGAGAGTGGGAAGCAG 280 

. • • • 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 

• • • • 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 360 

• ♦ • • 

3 61 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTCCTCTCCG 400 

401 TGTACGTTCAAGCTGCCAACCTCCACCTCTCAGTTTTGAG 440 

• ♦ • ■ 

441 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 480 

• • • • 

481 GCGACT ATC AATAGTCGTTATAATGATTTAACTAGGCTT A 520 
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521 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 

561 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATT66 

601 ATCAGGTACAACCAGTTCAGAAGAGAGCTTACACTAACTG 

64 1 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAG 

681 AACGTATCCAATTCGAAC AGTTTCCCAATT AACAAG AGAA 

721 ATTTATACAAACCCAGTATT AG AAAATTTTG ATGGT AGTT 

761 TTCGAGGCTCGGCTC AGGGCATAG AAGGAAGTATTAGGAG 

• • • • 

801 TCCACATTTGATGGATATACTTAATAGTATAACCATCTAT 

841 ACGGATGCTCATAGAGGAGAATACTACTGGTCCGGTCACC 

881 AGATC ATGGCTTCTCCTGTAGGGTTTTCGGGGCC AGAATT 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 

961 C AACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 

. • • • 

1001 GAACATTATCGTCCACCTTATATAGAAGACCTTTTAACAT 

1041 CGGGATC AACAACCAACAACTATCTGTTCTTGACGGGACA 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 
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• • • • 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 

• • • » 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

• • • • 

1201 AGTCATCGATTAAGCC ATGTTTC AATGTTTCGTTCAGGCT 1240 

• • • • 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 

• • • • 

128 1 CTCTTGGATACATCGTAGTGCTGAGTTCAACAACATCATC 1320 

• • • • 

1321 CCTTC ATCACAAATCACCCAAATCCCACTCACCAAGTCTA 1360 

• • • • 

1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 

• • • • 

1401 ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 1440 

• ■ • * 

1441 C AGATTTC AACCTTAAGAGTAAATATTACTGCACCATT AT 1480 

• • * * 

1481 CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

• • • • 

1521 AAACCTTCAGTTCCACACATCAATTGACGGAAGACCTATT 1560 

» • * " 

1561 AATC AGGGGAATTTTTC AGCAACT ATGAGT AGTGGG AGTA 1600 

• • • " 

1601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640 

• • • • 

1641 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

I • • • 

1681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 

1721 ATCGAATTGAATTTGTTCCGGCA 1743. 

16. A structural gene of Qaim 13 which encodes an insecticidal protein of BAX HD-73 having the 
sequence: 

48 
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. • • • 

1 ATGGCCATTGAAACCGGTTACACTCCCATCGACATCTCCT 4 0 

41 TGTCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGG 80 

• • • • 
8 1 TGCTGGGTTCGTTCTCGG ACTAGTTGACATC ATCTGGGGT 120 

• • • • 

121 ATCTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAA 160 

• • • • 
161 TTGAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAG 200 

• • • • 

201 GAACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTC 240 
. • • • 

241 TACCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCG 280 
. • • • 

281 ATCCTACT AACCC AGCTCTCCGCGAGGAAATGCGTATTCA 320 

• • • • 
321 ATTCAACGACATGAACAGCGCCTTGACCACAGCTATCCCA 350 

» • • - 

361 TTGTTCGCAGTCC AGAACTACCAAGTTCCTCTCTTGTCCG 400 

• • • • 

401 TGTACGTTCAAGC AGCTAATCTTC ACCTCAGCGTGCTTCG 440 

35 
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• • • 

441 AGACGTT AGCGTGTTTGGGCAAAGGTGGGG ATTCGATGCT 

• • • • 
481 GCAACCATCAATAGCCGTTACAACGACCTTACTAGGCTGA 

521 TTGGAAACTACACCGACCACGCTGTTCGTTGGTACAACAC 

561 TGGCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGG 

• • • * 
60 1 ATTAGATACAACC AGTTCAGGAGAGAATTG ACCCTC ACAG 

• . • • 
641 TTTTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAG 

• • • • 
681 AACCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAA 

721 ATCTATACTAACCCAGTTCTTGAGAACTTCGACGGT AGCT 
. • • • 

7 61 TCCGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAG 

• • • • 
801 C CCAC ACTTG ATGGACATCTTGAACAGCATAACTATCTAC 

841 ACCGATGCTC ACAGAGGAGAGTATTACTGGTCTGGACACC 
. • • • 

881 AGATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTT 

• • • • 
921 TACCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCA 

• • • • 
961 CAACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA 

• • • • 
1001 G AACCTTGTCTTCC ACCTTGT AC AG AAGACCCTTCAATAT 



50 



EP 0 385 962 A1 

• • • • 
1041 CGGTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACA 

• • • " 

1081 GAGTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTG 
. • • • 

1121 TTTACAGAAAGAGCGGAACCGTTG ATTCCTTGGACG AAAT 

1161 CCCACCACAGAACAACAATGTGCC ACCCAGGCAAGGATTC 

1201 TCCCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGAT 

1241 TCAGC AACAGTTCCGTGAGCATC ATCAGAGCTCCTATGTT 

• - • • 
1281 CTCTTGGATACACCGTAGTGCTGAGTTCAACAACATCATC 

. • • • 

1321 GCATCCGATAGTATT ACTC AAATC CCTGC AGTG AAGGG AA 

• • • • 
1361 ACTTTCTCTTCAACGGTTCTGTCATTTCAGGACC AGGATT 

. • • • 

1401 CACTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAAT 

1441 AACATTCAG AATAGAGGGTATATTGAAGTTCCAATTCACT 

• • • • 
1481 TCCCATCC ACATCTACCAGAT ATAGAGTTCGTGTGAGGTA 

• • • * 
152 1 TGCTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGT 

• • • • 
1561 AATTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTA 

• • • • 
1601 CCTCCTTGGATAATCTCCAATCCAGCGATTTCGGTT ACTT 
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• • * 

1641 TGAAAGTGCCAATGCTTTTACATCTTC ACT CGGTAACATC 1680 
• • • • 

1681 GTGGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTA 1720 
. • • • 

1721 TCGACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGA 17 60 

17 61 GGCTGAG 1767. 

17. A structural gene of Claim 13 encoding a insecticidal protein of BJX HD-1 having the sequence: 

• • • • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

• • * . • 

4 1 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGG AGA 8 0 

• • • 

8 1 ACGCATTGAAACCGGTTACACTCCCATCGAC ATCTCCTTG 120 
« • • • 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG ISO 

. • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 2 00 

• • • • 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

• • • • 

241 GAGC AGTTGATC AACCAGAGGATCG AAGAGTTCGCC AGGA 280 

. * • • 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

• . • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 
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• • • • 
361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • a • 

401 TCAACGACATGAAC AGCGCCTTGACC ACAGCTATCCC ATT 440 

• • • • 
441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTC AAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT *600 

20 601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• * • • 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 
681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • • 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 

• • • • 
7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• • • • 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• • • • 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 
881 C AC ACTTGATGGAC ATCTTG AAC AGCATAACT ATCT ACAC 920 
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961 ATCATGGCCTCTCCAGTTGGATTC AGCGGGCCCGAGTTTA 1000 

• • • • 

1001 CCTTTCCTCTCT ATGGAACTATGGG AAACGCCGCTCC ACA 1040 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

1081 ACCTTGTCTTCC ACCTTGTACAGAAGACCCTTC AATATCG 1120 

• • • 

1121 GT ATC AAC AACC AGCAACTTTCCGTTCTTGACGGAAC AGA 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

. • • • 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

. •■ • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 C C AC AGGTTGAGCC ACGTGTCC ATGTTCCGTTCCGG ATTC 1320 
1321 AGC AACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

1361 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

. • • • 

1401 TTCCTCTCAAATC ACCCAAATCCCATTGACCAAGTCTACT 1440 

. • • • 

1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

...» 
1481 TC ACAGG AGGTGATATTCTTAGAAG AACTTCTCCTGGCC A 1520 

• • • ' 

1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 1560 
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« • • • 

15 61 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 

« • • « 

1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 

• • • • 

1641 TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 1680 

. - • • • 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTC ACTACTC 1720 

• • • • 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 17 60 

• • * • 

1761 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 1800 

• • • • 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCTTCGAGGCTG 1840 

1841 AGTAC 1845. 

18. A structural gene of Qaim 13 encx)ciing an insecticidal protein derived from BXk, HD-73 having the 
sequence: 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCAT ACA 4 0 

• • • • 

4 1 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 8 0 

• « * • 

8 1 ACGCATTGAAACCGGTTACACTCCC ATCGAC ATCTCCTTG 120 

• • • • . 
121 TCCTTGACACAGTTTCTGCTCAGCG AGTTCGTGCCAGGTG 1 60 

• • • • 

1 51 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 
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201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGC AAATT 240 

• • • 

5 241 GAGCAGTTGATCAACC AGAGGATCGAAGAGTTCGCCAGGA 2 80 

» - • • 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 32 0 

» . • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

. • • • 

351 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

IS 

. • • • 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

• ■ • > 

20 441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• ♦ • • 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG - 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

• • • • 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

30 

• • * ' 

60 1 GGAAACT AC ACCGACCACGCTGTTCGTTGGT AC AAC ACTG 640 

• • • • 

^ 641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGAITGGAT 680 

. • • • 

681 TAGATAC AACCAGTTCAGG AGAGAATTGACCCTCACAGTT 720 

40 ♦ . . • 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 

• • • • 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

45 
SO 
55 
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801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• • • • 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

6 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• • • 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA XOOO 
15 . . • . 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 

20 1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 

1081 ACCTTGTCTTCC ACCTTGT ACAGAAG ACCCTTC AATATCG 1120 
^ 1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

. • • • 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

30 . ' 

» • • • 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

35 1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• . • • 

40 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1350 
1361 CTTGGAT AC ACCGTAGTGCTG AGTTCAACAAC ATCATCGC 1400 
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• • • 

1401 ATCCGAT AGTATT ACTC AAATCCCTGC AGTG AAGGG AAAC 1440 

• • • • 

14 41 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

» • • ' 

1481 CTGGTGGAG ACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• • • • . 
1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

• • • • 

1561 CC ATCCACATCTACCAGATATAGAGTTCGTGTGAGGT ATG 1600 

• • • " 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 

• • • 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

. • • ■ 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

• • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 17 60 

. • • • 

17 61 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

1801 G AC AGATTCG AGTTC ATTCCAGTT ACTGCAAC ACTCG AGG 1840 

• • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTAATGCG 1880 

• • • • 

1881 CTGTTTACGTCTACAAACCAGCTTGGACTCAAGACAAATG 1920. 

19. A structural gene of Claim 13 encoding the full-lenglh insecticidal protein of S.fA HD-73 having the 
sequence: 
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• • • • 
1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 

4 1 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 

• • • • 

8 1 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCrrG 

• • • • 
121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 

• • • • 
161 CTGGGTTCGTTCTCGGACTAGTTG AC ATC ATCTGGGGTAT 

201 CTTTGGTCCATCTC AATGGGATGCATTCCTGGTGC AAATT 

• - • . • 
241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

• • • • 
281 AGCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

• • • • 
321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

• " • ' 
361 CCTACT AACCC AGCTCTCCGCGAGGAAATGCGTATTC AAT 

• • * - • 
401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 

• • • ■ 
441 GTTCGC AGTCC AGAACTACCAAGTTCCTCTCTTGTCCGTG 

• * • • 
481 TACGTTCAAGC AGCTAATCTTC ACCTCAGCGTGCTTCGAG 

• • • • 
521 ACGTTAGCGTGTTTGGGC AAAGGTGGGGATTCG ATGCTGC 

• • • ■ 
561 AACCATCAAT AGCCGTTAC AACGACCTTACT AGGCTGATT 
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• * • • 
601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

, . • • 

64 1 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT '680 

68 1 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • * 
721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 

• • • • 
761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• - ♦ • 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• • • • 

20 841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

• • • • 
921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• • • • 

961 ATC ATGGCCTCTCC AGTTGGATTC AGCGGGCCCG AGTTTA 1000 
• • • • 

1001 CCTTTC CTCTCT ATGG AACTATGGGAAACGCCGCTCCAC A 1040 

• • • " 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

40 , . . • 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• * • * 

1161 GTTCGCCTATGG AACCTCTTCTAACTTGCCATCCGCTGTT 1200 
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• • • • 
1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 
1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• • • • 
1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 13 60 

• • • • 
1361 CTTGGAT ACACCGTAGTGCTGAGTTCAAC AACATC ATCGC 1400 

• • • • 
1401 ATCCGATAGTATTACTC AAATCCCTGCAGTGAAGGGAAAC 144 0 

• • • * • 
1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

• • • • 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• • • • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

• • • • 
1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

• • • • 
1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1 540 

• • • • 
1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1 680 

• • • • 
1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAAC ATCGT 1760 
45 1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 
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1801 GACAGATTCGAGTTCATTCCA6TTACTGCAACACTCGAGG 1840 

• • • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCG6TGAATGC 1880 

1881 GCTGTTTACGTCTACAAACCAGCTCGGCCTCAAGACCAAT 1920 

1 921 GTGACGGATTATCATATTGATCAAGTGTCCAACTTGGTGA 1960 

• • • • 

1 961 CCTACCTCAGCGATGAGTTCTGTCTGGATGAAAAGCGAGA 2000 

2 001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

2041 GAACGC AATTTACTCCAAGATTC AAATTTCAAAGACATTA 2080 

• • • • 

2081 ATAGGC AACCAGAACGTGGGTGGGGCGG AAGTAC AGGG AT 2120 

2121 TACCATCCAGGGAGGTGACGACGTGTTCAAGGAGAACTAC 2160 

• * 

2161 GTCAC ACTATCAGGTACCTTTGATG AGTGtT ATCC AACAT 2200 

2201 ACCTCTACCAGAAGATCGACGAGTCCAAGTTGAAAGCCTT 2240 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 

• • • • 

2281 GACCTCGAGATCTACCTCATCCGCTACAATGCAAAACATG 2320 

• • • • 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

, • • • 

2361 TTCAGCCC AAAGTCC AATCGGAAAGTGTGGAGAGCCGAAT 2400 
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• • • • 
2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTT AGATT 2440 

• • • • 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

• • • • 

2481 TCATTTCTCCTT AGAC ATTG ATGTAGGATGTAC AG ACTTA 2520 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 25 60 

• • • • 
2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

• • • * 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

• - • • 

2" 2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAGAAGT 2680 

• ♦ • • 

2681 TGGAATGGGAGACCAAC ATCGTCTACAAAGAGGCAAAAGA 2720 

» • • • 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 

. • • 

2761 TTAC AAGCGG AT ACGAATATTGCCATGATTCATGCGGC AG 2800 

• • • • 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

• • • • 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • • • 

2831 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTCTACG 2 92 0- 

• • • * 

2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAAC AATGG 2960 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 
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3001 GAACAAAACAACCAACGTTCGGTCCTXGTTGTTCCGGAAT 3040 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

• • • • 

3161 ATACAG ACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGfTACACTTCTCGTA 3280 

. . • • 

3281 ATCGAG6ATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

• • • • 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGT ATAGGGATT 3400 

» • • * 

3401 ACACGCC ACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

• • • • 

3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 

. • • • 

3481 GAAACGGAAGGAAC ATTTATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534. 

20. A structural gene of Claim 13 encodirrg a full-length insectiddal protein of BJX HD-73 having the 
sequence: 
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1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 

• • . • • 

4 1 ACTGCTTGAGTAACCC AGAAGTTGAAGTACTTGGTGGAGA 
a 1 ACGCATTGAAACCGGTTACACTCCCATCGAC ATCTCCTTG 
121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 

• • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 
201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 
241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

• • • * 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

321 CCAAATCTATGCAGAGAGCTTC AGAGAGTGGGAAGCCGAT 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

♦ 

401 TCAACGACATGAACAGCGCCTTGACC AC AGCTATCCC ATT 

• • • • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

• • • * 
481 TACGTTCAAGCAGCT AATCTTCACCTCAGCGTGCTTCGAG 
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. • • • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

• • • • 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 60 0 

• ' ♦ • • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• • • • 

64 1 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

• • • • 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCrCACAGTT 720 

. • • • 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 

. . • ♦ 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• • • 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

a • • * 

921 CGATGCTC ACAGAGGAGAGTATT ACTGGTCTGGAC ACCAG 960 
« 

» • • • 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

. • • • 

1001 CCTTTCCTCTCT ATGGAACTATGGGAAACGCCGCTCCACA 1040 

. • • • 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

< • • * 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 
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• • • • 
1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• • • * 
1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

. • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

75 

. • • • 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

• • • ' 

1361 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 

. • • ♦ 

1401 ATCCGATAGTATTACTC AAATCCCTGCAGTGAAGGGAAAC 1440 

• • • • 
1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• • • • 
1521 CATTCAGAATAG AGGGT ATATTGAAGTTCCAATTC ACTTC 1560 

, • m * 

1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

• * • • 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 

. • • • 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

. • • • 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 
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• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

» • • • 

17 61 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

• • • • 
1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

• • * • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

« • • • 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

• • • • 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTT AGTTA 1960 

• • • • 

20 1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 

• • • • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

• • • , • 

204 1 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 

• • • • 
2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 

• • • • 

2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 
. • • • 

35 2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

■ • • • 

2201 ATTTGTATCi\AAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

40 .... 

2241 TACCCGTT ATC AATTAAGAGGGT AT ATCG AAGATAGTC AA 2280 

• ■ • • 

2281 GACTTAGAAATCTATTT AATTCGCTACAATGCAAAAC ATG 2320 
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• • • • 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

• • • • 
2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

• • • » 
2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTT AGATT 2440 

• * • • 
2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCC ATCATTCGCA 2480 

. • • • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

• • • • 

2521 AATG AGG ACCT AGGTGT ATGGGTGATCTTT AAG ATTAAGA 2560 

• ♦ • • 

^ 2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

• • • • 

2 501 CGAAGAGAAACC ATTAGTAGGAG AAGCGCTAGCTCGTGTG 2 64 0 

, . . . • 

2541 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2 680 

• • • 

2 681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 

• • • * 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTC AATATGATCAA 27 60 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

. • • • 

2801 ATAAACGTGTTC ATAGC ATTCGAGAAGCTTATCTGCCTGA 2840 

40 .... 

2841 GCTGTCTGTG ATTCCGGGTGTC AATGCGG CT ATTTTTG AA 2880 
2881 GAATTAGAAGGGCGTATTTTC ACTGCATTCTCCCTATATG 2920 
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• • • • 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 

• • • • 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 
. • • • 

3001 GAAC AAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 
. • • • 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• • • • 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

• • • • 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

• • • • 

3161 ATACAGACGAACTG AAGTTT AGC AACTGCGTAGAAGAGGA 3200 

• • • • 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATT ATACT 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGT ACACTTCTCGTA 3280 

. • • • 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

• • • • 

3361 AGAG AG AATCCTTGTG AATTTAAC AG AGGGTAT AGGGATT 3400 

• • • • 

3401 ACACGCCACTACCAGTTGGTTATGTG ACAAAAGAATTAG A 344 0 

. • • • 

3441 ATACTTCCCAG AAACCG ATAAGGTATGGATTGAG ATTGGA 3480 

« • • • 

3481 GAAACGGAAGGAAC ATTTATCGTGG AC AGCGTGGAATT AC 3520 

3521 TCCTTATGGAGGAA 3534. 

21. A structural gene of Clam 13 encoding a full-length insecticidal protein of BXk, HD-73 having the 
sequence: 
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• • • • 

1 ATGGACAACAACCCAAACATCAACG AATGC ATTCCATACA 4 0 

• • • • 

4 1 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

• • • • 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• » • • 

161 CTGGGTTCGTTCTCGGACT AGTTGAC ATCATCTGGGGTAT 200 

• • ■ * 

201 CTTTGGTCCATCTC AATGGGATGCATTCCTGGTGC AAATT 24 0 

• • • • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

• • • • 

281 ACC AGGCCATCTCTAGGTTGGAAGG ATTGAGCAATCTCTA 320 

• • • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 3 60 

• • • * 

3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • • * 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 4 4 0 
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• • « « 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

• • • • 
481 TACGTTC AAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

• • • • 
521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

• • • • 
5 61 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

• • • • 

601 GGAAACT ACACCGACC ACGCTGTTC GTTGGT AC AAC ACTG 

• • • • 

541 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAG ATTGGAT 

• * • • • 
681 TAGATAC AACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

• • • • 
7 61 CCTACCCT ATCCGTAC AGTGTCCC AACTTACC AGAGAAAT 

• • • « 
801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

• • • • 
841 CGTGGTTCTGCCC AAGGTATCGAAGGCTCCATCAGGAGCC 

• • • • 
881 C ACACTTGATGGACATCTTGAACAGCATAACTATCTAC AC 

• • • • 
921 CGATGCTCAC AGAGGAGAGTATTACTGGTCTGGACACCAG 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

• • • • 
1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 
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. • • • 

1041 ACAACGT ATCGTTGCTCAACTAGGTC AGGGTGTCTAC AGA 

• • • • 
1081 ACCTTGTCTTCC ACCTTGT AC AGAAGACCCTTCAATATCG 

. • • • 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 

« • • • 

1161 GTTCGCCT ATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

• » • • 
1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 

• • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 

• • • • 
1281 CCACAGGTTGAGCCACGTGTCC ATGTTCCGTTCCGGATTC 

• • • • 
1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 

, • • • • 

1361 CTTGGATACACCGTAGTGCTGAGTTCAACAAC ATCATCGC 

. • • 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 

• • • * 
1481 CTGGTGG AGACCTCGTTAG ACTC AAC AGC AGTGGAAATAA 

1521 CATTC AGAATAGAGGGTAT ATTG AAGTTCCAATTC ACTTC 

1561 CC ATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 

1 601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 
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1641 TTCATCC ATCTTCTCCAATACAGTTCCAGCTACAGCTACC 
. t • • 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 

• • • • 
1721 AAAGTGCCAATGCTTTTAC ATCTTC ACTCGGTAACATCGT 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 

1801 G ACAGATTCGAGTTC ATTCC AGTT ACTGCAAC ACTCG AGG 

. • • ' 

1841 CTGAGTACAACCTTGAGAG AGCCCAGAAGGCTGTG AACGC 

1881 CCTCTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAAC 

1921 GTTACTGACTATCACATTGACCAAGTGTCCAACTTGGTCA 

• • • " 
1961 CCTACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGA 

. • • • 

2001 ACTCTCCGAGAAAGTTAAACACGCC AAGCGTCTCAGCGAC 

. • • • 

2041 G AGAGGAATCTCTTGC AAGACTCC AACTTC AAAGAC ATCA 

2081 AC AGGCAGCC AGAACGTGGTTGGGGTGGAAGC ACCGGGAT 

• • * 
2121 CACCATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTAC 

• • • 
2161 GTCACCCTCTCCGGAACTTTCGACG AGTGCT ACCCTACCT 

. • • • 

2201 ACTTGT ACCAG AAG ATCG ATGAGTCC AAACTC AAAGCCTT 
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22 41 CACCAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAA 2280 

2281 GACCTTGAAATCTACTCGATCAGGTACAATGCCAAGC ACG 2320 

• • • • 
2321 AGACCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACT 2360 

• • • • 

2361 TTCTGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAAC 2400 
. • • • 

,5 2401 AGATGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACT 2440 

. • • • 

24 41 GCTCCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCA 2480 

. • • • 

2481 TCACTTCTCCTTGGACATCGATGTGGGATGTACTGACCTG 2520 

• • • ' 

2521 AATGAGG ACCTCGGAGTCTGGGTC ATCTTCAAGATC AAGA 2560 

2561 CCCAAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCT 2600 

• • • * 

30 2 501 .CGAAGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTG 2 640 

. • • * 

2 541 AAGAGAGC AGAGAAGAAGTGGAGGG ACAAACGTGAGAAAC 2680 

• • • • 

2 681 TCGAATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGA 2720 

2721 GTCCGTGG ATGCTTTGTTCGTG AACTCCC AAT ATG ATC AG 27 60 

• • * 

27 61 TTGCAAGCCGACACCAACATCGCCATGATCCACGCCGCAG 2800 

^5 2801 ACAAACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGA 2840 
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• • • • 

2841 GTTGTCCGTGATCCCTGGTGTGAACGCTGCC ATCTTCGAG 2880 

« • • • 

2881 GAACTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACG 2 92 0 

• • • • 
2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 2960 

. • • • 

2961 CCTCAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAG 3000 
. • • • 

15 3001 GAACAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGT 3040 

« • • • 

3041 GGGAAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGG 3080 

• • • • 
3081 TAGAGGCTACATTCTCCGTGTGACCGCTTAC AAGGAGGGA 3120 

» • • • 

3121 TACGGXGAGGGTTGCGTGACCATCCACGAGATCGAGAACA 3150 

• • ♦ • 

3161 ACACCGACGAGCTTAAGTTCTCC AACTGCGTCG AGGAAGA 3200 

30 3201 AATCTATCCCAACAACACCGTTACTTGCAACGACTACACT 3240 

3241 GTGAATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTA 3280 

35 • < . ' 

3281 ACAGAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTA 3320 

■ a • • 

3321 TGCCTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGA 3360 

40 

3361 CGTGAGAACCCTTGCGAGTTCAACAGAGGTT AC AGGG ACT 3400 

• • • • 

'^^ 3401 ACACACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGA 3440 
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3441 GTACTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGT 3480 

• • • 

3481 GAAACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTC 3520 

3521 TCTTG ATGGAGG AA 3534. 

76 
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22. A statctural gene of Claim 13 which encodes an insecticidal protein of BXt having the sequence: 

1 ATGACTGCAGACAACAAC ACCGAAGCCCTCGACAGTTCTA 4 0 

• • • • 

4 1 CCACTAAGGATGTTATCCAGAAGGGTATCTCCGTTGTGGG 8 0 

• • • • 

8 1 AGACCTCTTGGGCGTGGTTGGATTTCCCTTCGGTGGAGCC 120 

121 CTCGTGAGCTTCTATACAAACTTTCTCAACACCATTTGGC 160 
. • • " 

■ 1 6 1 CAAGCGAGGACCCTTGGAAAGCATTCATGGAGCAAGTTGA 200 

20 1 AGCTCTTATGGATCAGAAG ATTGC AGATTATGCCAAGAAC 24 0 

241 AAGGCTTTGGC AGAACTCC AGGGCCTTCAGAACAi^TGTGG 280 

• • • • 

281 AGGACTACGTGAGTGC ATTGTCCAGCTGGCAG AAGAACCC 320 

321 TGTTAGCTCCAGAAATCCTCACAGCCAAGGTAGGATCAGA 3 60 

3 61 GAGTTGTTCTCTCAAGCCGAATCCC ACTTCAGAAATTCCA 400 
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• • • 

401 TGCCTAGCTTTGCTATCTCCGGTTACGAGGTTCTTTTCCT 

• * • « 
441 CACTACCTATGCTCAAGCTGCCAAC ACCCACTTGTTTCTC 

• • • • 
481 CTTAAGGACGCTCAAATCTATGGAGAAGAGTGGGGATACG 

• > • ' • 
521 AGAAAGAGGACATTGCTGAGTTCTACAAGCGTC AACTTAA 

561 GCTCACCCAAGAGTACACTGACCATTGCGTGAAATGGTAT 

• • • • 
601 - AACGTTGGTCTCGATAAGCTCAGAGGCTCTTCCTACGAGT 

• • • > " 
641 CTTGGGTGAACTTCAACAG ATACAGGAGAGAGATGACCTT 

• • • • 

681 GACTGTGCTCGATCTTATCGCACTCTTTCCCTTGTACGAT 

721 GTGAGACTCTACCCAAAGGAAGTGAAAACTGAGCTTACCA 

• » • • 
7 61 GAGACGTGCTCACTGACCCTATTGTCGGAGTCAACAACCT 

801 TAGGGGTTATGGAACTACCTTCAGCAATATCGAAAACTAC 

841 ATTAGGAAACC AC ATCTCTTCGACTATCTTC ACAGAATTC 

• • • • 
881 AATTCCACACAAGGTTTCAACCAGGAT ACT ATGGTAACGA 

• • • • 
921 CTCCTTCAACTATTGGTCCGGTAACTATGTTTCCACCAGA 

961 CC AAGC ATTGGATCTAATG ACATC ATC AC ATCTCCCTTCT 
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• • • 

1441 AATGGCACCATGATTCACCTTGCACCAGAAGATTACACTG 1480 

. • • ♦ 

1481 G ATTCACCATCTCTCCAATCCATGCTACCCAAGTGAACAA 1520 

1521 TCAGACACGCACCTTCATCTCCGAAAAGTTCGGAAATCAA 1560 

. • • • 

1561 GGTGACTCCTTGAGGTTCGAGCAATCCAACACTACCGCTA 1 600 

1601 GGTACACTTTGAGAGGCAATGGAAACAGCTACAACCTTTA 1 &4 0 

• • • . * 

1641 CTTGAGAGTTAGCTCCATTGGTAACTCCACCATCCGTGTT 1680 

1 681 ACCATCAACGGACGTGTTTACACAGTCTCTAATGTGAACA 1720 
1721 CTAC AACGAAC AATGATGGCGTT AACGAC AACGG AGCC AG 17 60 

17 61 ATTCAGCGACATCAACATTGGCAACATCGTGGCCTCTGAC 1800 

• • * * 

1801 AAC ACT AACGTTACTTTGG ACATC AATGTG ACCCTC AATT 1840 

1841 CTGGAACTCC ATTTGATCTCATGAAC ATCATGTTTGTGCC 1880 
1881 AACTAACCTCCCTCCATTGTACTAA 1905 . 

25. A plant transformation vector comprising 
plant gene containing a structural gene of Claim 13. 
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• • • • 

1601 GAGCACCCTTCAACCAGTATTACTTTGACAAGACC ATCAA 1640 

« • • * 

1641 C AAAGGTGAC ACTCTC ACAT ACAAT AGCTTC AACTTGGC A 1680 

• • • • 

1681 AGTTTC AGC ACACC ATTTG AACTCTCAGGC AACAATCTTC 1720 

t • • • 

1721 AGATCGGCGTCACCGGTCTCAGCGCCGGAGACAAAGTCTA 1750 

• • • 

17 61 CATCGACAAGATTGAGTTCATCCCAGTGAAC 1791. 

23. A structural gene of Claim 13 which encodes an Insectlcldal protein of B.t entomoddus having the 
sequence: 

• ■ • " 

1 ATGGAGGAGAACAACCAAAACCAATGCATTCCATACAACT 4 0 

4 1 GCTTGAGTAACCCAGAAGAGGTATTGCTTGATGGAGAACG 80 

• - • 

8 1 CATTTCAACCGGTAACTCTTCC ATCGACATCTCCTTGTCC 120 
. » • • 

121 TTGGTCCAGTTTCTGGTCAGCAACTTCGTGCCAGGTGGTG 1 60 

161 GGTTCCTTGTCGGACTAATTGACTTCGTCTGGGGTATCGT 200 

• • • ' 

201 TGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATTGAG 240 

241 C AGTTGATCAACGAGAGGATCGCTGAGTTCGCCAGGAACG 280 
281 CTGCCATCGCTAACTTGG AAGGATTGGGCAATAACTTCAA 320 
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321 CATCTATGTGGAGGCCTTCAAAGAGTGGGAAGAGG ACCCT 

3 61 AACAACCCAGAGACCCGCACTAGGGTGATCGACAGATTCA 

401 G AATCTTGGACGGCCTCTTGGAGAGAGATATCCCATCCTT 

• • • • 
441 C AGAATCTCTGGCTTCGAAGTTCCTCTCTTGTCCGTGTAC 

• • • • 

4 81 GCTCAAGCAGCTAATCTTC ACCTCGCTATCCTTCGAG ACA 

521 GTGTCATCTTTGGGGAAAGGTGGGGATTGACCACTATCAA 

• _ • 

561 CGTCAATGAGAATTACAAC AGACTTATCAGGCACATTGAC 
. • • • 

601 GAGTACGCCGACCACTGTGCTAACACCTACAACCGTGGCT 

« • • • 

641 TGAACAATCTCCCTAAGTCTACTTATCAAGATTGGATTAC 

• • • • 

681 CTACAACAGGTTGAGGAGAGACTTGACCCTCACAGTTTTG 

721 GACATTGCAGCTTTCTTCCCGAACT ATGACAACAGGAGAT 

• • • • 
7 61 ACCCTATCCAACCAGTGGGTCAACTTACCAGAGAAGTCTA 

. • • • 

801 T ACTGACCCACTTATCAACTTCAACCCTCAGTTGCAAAGT 

• • • • 
841 GTCGCCCAACTTCCCACATTCAACGTCATGGAGTCCAGCC 

. . • • 

881 GTATCAGGAACCCACACTTGTTTGAC ATCTTGAACAACCT 
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• • • • 

321 TACTATCTTCACCGATTGGTTCAGCGTTGGGCGTAACTTC 960 

• t • • 
961 T ATTGGGGTGGAC AC AGGGTC ATCTCCTCTCTTATTGGAG 1000 

• • • • 
1001 GTGGGAACATTACCTCTCCTATCTATGGACGTGAGGCAAA 1040 

• • • • 

10 4 1 CCAGGAGCCACCACGTAGTTTCACCrrCAACGGTCCAGTC 1080 

• • • * 

IS 1081 TTCAGAACCTTGTCTAACCCTACCTTGAGATTGCTCCAGC 1120 

• • • • 

1121 AACCTTGGCCAGCTCCACCTTTCAACtTTAGAGGTGTTGA 1 1 60 

• • » • 
1161 GGGCGTTGAGTTCTCTACTCCTACCAACTCCTTCACTTAC 1200 

. • • • 

1201 AGAGGTAGAGGAACCGTTGATTCCTTGACCG AACTCCCAC 1240 

• » • • 

1241 CAGAGGACAATAGCGTGCC ACCCAGGGAAGGCTACTCCCA 1280 

30 1281 • CAGGTTGTGCCACGCAACCTTCGTGCAGCGTTCCGGAACT 1320 

1321 CC ATTCCTC ACT ACAGGAGTTGTGTTCTC ATGGACTG ATC 1360 

• • • ♦ 
1361 GTAGTGCTACTCTCACTAATACCATTGATCCCGAGAGGAT 1400 

• • • • 

14 01 CAATCAAATCCCATTGGTCAAGGGTTTCCGTGTGTGGGGA 1440 

• • • • 

14 41 GGAACTTCTGTC ATCACAGGACCAGGCTTCAC AGGAGGTG 1480 

14 81 ATATTCTTAGAAGAAACACTTTTGGCGACTTTGTGAGCCT 1520 
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1521 CCAAGTT AACATCAACTCTCCAATTACTCAAAGATATCGT 1560 

1561 CTCAGGTTTCGTTACGCATCTTCCCGTGACGCTAGAGTCA 1600 

• ' • * 
1601 TCGTGCTCACCGGAGCAGCTTCTACCGGTGTCGGTGGACA 1 640 

1641 AGTCTCCGTGT^CATGCCACTCCAGAAGACTATGGAGATC 1680 

. • • • 

1681 GGCGAGAACTTGACATCCAGGACCTTCAGATACACCGACT 1720 

1721 TCTCTAACCCTTTCAGTTTCCGTGCCAACCCTGACATCAT 17 60 

1761 TGGCATTAGCGAACAACCTCTCTTTGGAGCTGGTAGCATC 1800 

1801 TCATCTGGCGAATTGTAC ATTGACAAGATTG AGATCATTC 1840 

1841 TTGCCGACGCTACCTTCGAGGCTGAGTCTGACCTTGAGAG 1880 

30 1881 AGCCCAGAAGGCTGTGAACGCCCTCTTTACCTCCTCTAAT 1920 

• • • • 

1921 CAGATTGGCTTGAAAACTGACGTTACTGACTATCACATTG 1 9 60 

1961 ACCAAGTGTCCAACTTGGTCGACTGCCTTAGCGATGAGTT 2000 

. • • • 

2001 CTGCCTCGACGAGAAGCGTGAACTCTCCGAGAAAGTTAAA 2040 

• • • • 

2041 C ACGCCAAGCGTCTCAGCGACGAGAGGAATCTCTTGC AAG 2080 

^ 2081 ACCCCAACTTCAGAGGCATCAACAGGCAGCCAGACCGTGG 2120 
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• • ■ • * 
2121 TTGGAGAGGAAGCACCGACATCACCATCCAAGGAGGCGAC 21 60 

• • • * 
2161 GATGTGTTCAAGG AGAACT ACGTCACCCTCCC AGG AACTG 2200 

• • • • 
2201 TGGACGAGTGCT ACCCT ACCTACTI GT ACC AGAAGATCGA 2240 

• ♦ • • • 
2241 TG AGTCC AAACTC AAAGCCT ACACC AGGTATGAACTT AGA 2280 

• » • " 

^5 2281 GGCTACATCGAAGACAGCC AAGACCTTGAAATCTACCTCA 2320 

2321 TCAGGTACAATGCCAAGCACGAGATCGTGAATGTCCCAGG 2360 

• • • • 

23 61 TACTGGTTCCCTCTGGCCACTTTCTGCCCAAATGCCCATT 2400 

• • • • 
2401 GGGAAGTGTGGAGAGCCTAACAGATGCGCTCCACACCTTG 2440 

• • • • 

24 41 AGTGGAATCCTGACTTGGACTGCTCCTGCAGGGATGGCGA 2480 

» • • • 

24 81 GAAGTGTGCCCACCATTCTCATCACTTCACCTTGGACATC 2520 

2521 G ATGTGGGATGTACTGACCTGAATG AGGACCTCGG AGTCT 2560 

• • • • 
2561 GGGTCATCTTCAAGATCAAGACCCAAGACGGACACGCAAG 2 600 

• • » • 

2 601 ACTTGGCAACCTTGAGTTTCTCGAAGAGAAACCATTGCTC 2 640 

, • • • 

2641 GGTGAAGCTCTCGCTCGTGTGAAGAGAGCAGAGAAGAAGT 2 680 

• • • ■ 
2681 GGAGGGACAAACGTGAGAAACTCCAACTCGAGACTAACAT 27 20 
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• • • • 

2721 CGTTTACAAGGAGGCCAAAGAGTCCGTGGATGCTTTGTTC 2760 

• • • • 
27 61 GTGAACTCCCAATATGATAGGTTGCAAGTGGACACCAACA 2800 

• • » • 
2801 TCGCCATGATCC ACGCTGC AGACAAACGTGTGCACAGGAT 2840 

• • • • 
2841 TCGTGAGGCTTACTTGCCTGAGTTGTCCGTGATCCCTGGT 2880 

2881 GTGAACGCTGCC ATCTTCG AGGAACTTG AGGG ACGT ATCT 2920 

2921 TTACCGCATACTCCTTGTACGATGCCAGAAACGTCATCAA 2 960 

2961 GAACGGTGACTTCAACAATGGCCTCTTGTGCTGGAATGTG 3000 

• • • • 
3001 AAAGGTCATGTGGACGTGGAGGAACAGAACAATCACCGTT 3040 

3041 CCGTCCTGGTTATCCCTGAGTGGGAAGCTGAAGTGTCCCA 3080 

30 3081 AGAGGTTAGAGTCTGTCCAGGTAGAGGCTACATTCTCCGT 3120 

• • > • 

3121 GTGACCGCTT ACAAGGAGGGATACGGTGAGGGTTGCGTGA 3160 

• • • • 
3161 CCATCCACGAGATCGAGGACAACACCGACGAGCTTAAGTT 3200 

• • • * 

3201 CTCCAACTGCGTCGAGGAAGAAGTCTATCCCAACAACACC 3240 
3241 GTTACTTGCAACAACTAC ACTGGGACCC AGG AAGAGTACG 3280 
^5 32 81 AAGGTACCTACACTAGCCGTAACCAAGGTTACGACGAAGC 3320 
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• • • • 

3321 TTACGGAAACAATCCTTCCGTTCCTGCTGACTATGCCTCC 3360 
3361 GTGTACGAGGAGAAATCCTACACAGATGGCAGACGTGAGA 3400 

• • • • 

3401 ACCCTTGCG AGTCC AACAG AGGTTACGGTG ACT AC AC ACC 3440 
3441 ACTTCC AGCAGGCTATGTT ACC AAGG ACCTTGAGTACTTT 3 4 BO 
3481 CCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAAACCG 3520 

• • • • 

3521 AGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCTTGAT 3560 

3561 GGAGGAA 3567. 

24. A Structural gene of Claim 13 which encodes a P2 iosectiddal protein having the sequence: 

• • • • 

1 ATGGAC AACAACGTCTTGAACTCTGGT AGAACAACC ATCT 4 0 

« 

4 1 GCGACGCATACAACGTCGTGGCTCACGATCCATTCAGCTT 80 
81 CGAACACAAGAGCCTCGACACTATTCAGAAGGAGTGGATG 120 

• • • • 

12 1 GAATGGAAACGTACTGACCACTCTCTCTACGTCGCACCTG 1 60 

• • • • 

161 TGGTTGGAACAGTGTCCAGCTTCCTTCTCAAGAAGGTCGG 200 

• • • • 

201 CTCTCTCATCGGAAAACGTATCTTGtCCGAACTCTGGGGT 2 40 
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• • • • 

241 ATCATCTTTCCATCTGGGTCCACTAATCTC ATGCAAGACA 

• • • 

281 TCTTGAGGGAGACCGAAC AGTTTCTCAACC AGCGTCTCAA 

• • • • 
321 CACTGATACCTTGGCTAGAGTCAACGCTGAGTTGATCGGT 

361 CTCCAAGCAAACATTCGTGAGTTCAACCAGCAAGTGGACA 

• • • 

401 ACTTCTTGAATCCAACTCAGAATCCTGTGCCTCTTTCCAT 

441 CACTTCTTCCGTGAACACTATGCAGCAACTCTTCCTCAAC 

• • • 
481 AGATTGCCTCAGTTTCAGATTCAAGGCTACCAGTTGCTCC 

• • • • 
521 TTCTTCC ACTCTTTGCTCAGGCTGCCAACATGCACTTGTC 

561 CTTCATACGTGACGTGATCCTCAACGCTGACGAATGGGGA 

• • > * 

60 1 ATCTCTGCAGCCACTCTTAGGACATACAGAGACTACTTGA 

64 1 GGAACTACACTCGTGATTACTCCAACTATTGCATCAACAC 
» • * * 

681 TTATCAG ACTGCCTTTCGTGGACTCAATACTAGGCTTCAC 

• . • • 
721 GACATGCTTGAGTTCAGGACCTACATGTTCCTTAACGTGT 

. • • 

7 61 TTGAGTACGTCAGC ATTTGGAGTCTCTTC AAGT ACC AGAG 

801 CTTGATGGTGTCCTCTGG AGCC AATCTCTACGCCTCTGGC 
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841 AGTGGACCACAGCAAACTCAGAGCTTCACAGCTCAGAACT 880 

• • • • 

881 GGCCATTCTTGTATAGCTTGTTCC AAGTCAACTCC AACTA 920 

. • • * 

921 CATTCTCAGTGGT ATCTCTGGGACCAGACTCTCCATAACC 960 

951 TTTCCC AAC ATTGGTGGACTTCCAGGCTCC ACTAC AACCC 1000 

• • • 

1001 AT AGCCTTRACTCTGCC AG AGTGAACTACAGTGGAGGTGT 1040 

1041 C AGCTCTGGATTGATTGGTGC AACTAACTTGAACCACAAC 1080 

• • • * 

1081 TTCAATTGCTCCACCGTCTTGCCACCTCTGAGCACACCGT 1120 

. • • • 

1121 TTGTGAGGTCCTGGCTTGACAGCGGTACTGATCGCGAAGG 1160 

1161 AGTTGCT ACCTCTAC AAACTGGCAAACCGAGTCCTTCCAA 1200 
1201 ACCACTCTTAGCCTTCGGTGTGGAGCTTTCTCTGCACGTG 1240 

1241 GGAATTC AAACTACTTTCC AG ACT ACTTC ATT AGGAAC AT 1280 

• • • • 

1281 CTCTGGTGTTCCTCTCGTC ATC AG GAATGAAGACCTC ACC 1320 

• • • • 

1321 CGTCC ACTTC ATT ACAACCAGATT AGGAAC ATCGAGTCTC 1360 

1361 CATCCGGTACTCCAGGAGGTGCAAGAGCTTACCTCGTGTC 1400 

• • • • 

1401 TGTCCATAACAGG AAGAAC AAC ATCTACGCTGCCAACGAG 1440 
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1441 AATGGCACCATGATTCACCTTGCACCAGAAGATTAC ACTG 1480 

1481 G ATTCACCATCTCTCCAATCCATGCTACCCAAGTGAACAA 1520 

1521 TCAGAC ACGC ACCTTC ATCTCCGAAAAGTTCGG AAATC AA 1560 

1561 GGTGACTCCTTGAGGTTCG AGCAATCC AAC ACT ACCGCTA 1600 
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1601 GGTACACTTTGAGAGGCAATGGAAACAGCT AC AACCTTTA 1 6,4 0 

1641 CTTGAGAGTTAGCTCCATTGGTAACTCCACCATCCGTGTT 1680 

1681 ACCATCAACGGACGTGTTTACACAGTCTCTAATGTGAACA 1720 

1721 CTACAACGAACAATGATGGCGTTAACGACAACGGAGCCAG 1760 

17 61 ATTCAGCGACATCAACATTGGCAACATCGTGGCCTCTGAC 1800 

• - • ' 

1801 AACACTAACGTTACTTTGGACATCAATGTGACCCTCAATT 1840 

1841 CTGGAACTCC ATTTGATCTCATGAACATCATGTTTGTGCC 1880 

1881 AACTAACCTCCCTCCATTGTACTAA 1905. 



25. A plant transformation vector comprising a plant gene containing a structural gene of Claim 13. 

26. A structural gene sequence of Claim 13 encoding a fusion protein comprising th© N-terminal 610 
amino acids of B,tX HD-1 and the C-terminal 567 amino acids of BJX HD-73, said gene having the 



sequence: 
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• • • * * 
1 ^ ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

• • • • 
4 1 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

• • • • 
81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

« • • • ■ 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 1 60 

• • • • 
161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• • • • 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGC AAATT 240 

• • • • 

20 241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGG ATTGAGCAATCTCTA 320 

• • • • 
321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

• • • • 
3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • • • 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 4 40 

35 



25 



30 



45 



50 



55 



90 



EP 0 385 962 A1 

• • • • 
441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

. • • • 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

• • • • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 
561 AACCATC AATAGCCGTTACAACGACCTT ACTAGGCTG ATT 

• • • • 

601 GGAAACT ACACCG ACC ACGCTGTTCGTTGGT AC AAC ACTG 

64 1 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 

• • • • 
681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

• • • • 
721 TTGGAC ATTGTGTCTCTCTTCCCGAACTATGACTCC AGAA 

• • • • 

7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

• • " * 
801 CTATACT AACCCAGTTCTTG AGAACTTCGACGGTAGCTTC 

841 CGTGGTTCTGCCC AAGGTATCGAAGGCTCCATCAGG AGCC 

• • • • 
881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

• « • • 
921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGAC ACCAG 

• • • • 
961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 
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• • • • 

1041 ACAACGT ATCGTTGCTCAACTAGGTC AGGGTGTCTAC AGA 1080 

• • • • 
1081 ACCTTGTCTTCCACCTTGT ACAGAAGACCCTTCAATATCG 1120 

• • « • 
1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• • • • - 

1161 CTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • • 

rs 1201 TACAGAAAGAGCGGAACCGTTGAITCCTTGGACGAAATCC 1240 

• • • • 

12 4 1 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 
1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• • • • 
1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 13 60 

• « • • 

13 61 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

• • » • 

30 1401 TTCCTCTCAAATCACCCAAATCCCATTGACCAAGTCTACT 1440 

• • • • 

1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACC AGGCT 1480 

• • » • 
1481 TCACAGGAGGTGATATTCTTAGAAG AACTTCTCCTGGCC A 1520 

■ • • • 
1521 GATTAGCACCCTCAGAGTTAACATC ACTGCACCACTTTCT 1560 

■ • • • 

15 61 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 

^5 1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 
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« • 

1641 TC AGGGTAACTTCTCCGCAACCATGTC AAGCGGC AGC AAC 

• • • • 
1681 TTGCAATCCGGC AGCTTCAGAACCGTCGGTTTCACTACTC 

• • • * 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 

• • • • 
1761 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTAC ATTGAC 

• • • • 
1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCCTCGAGGCTG 

• • • • 
1841 AGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGCCCT 

• • • • 
1881 CTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAACGTT 

• • • • 
1921 ACTGACTATCACATTGACC AAGTGTCCAACTTGGTCACCT 

1961 ACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGAACT 

• • • • 
2001 CTCCGAGAAAGTTAAACACGCCAAGCGTCTC AGCGACGAG 

• • • • 
2041 AGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCAACA 

. • • • 

2081 GGCAGCC AGAACGTGGTTGGGGTGG AAGCACCGGGATCAC 

• • • • 
2121 C ATCCAAGGAGGCGACGATGTGTTC AAGGAGAACTACGTC 

• • • • 
2161 ACCCTCTCCGGAACTTTCG ACGAGTGCTACCCTACCTACT 

• • * • 
2201 TGTACC AGAAG ATCGATGAGTCC AAACTCAAAGCCTTCAC 



93 



70 



20 



25 



EP 0 385 962 A1 

. • • * 

2241 CAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAAGAC 2280 

• • • • 
2281 CTTGAAATCTACTCGATCAGGTACAATGCCAAGCACGAGA 2320 

• • • • 
2321 CCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACTTTC 2360 

• • • 

2361 TGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAACAGA 2400 

• • • • 

15 2401 TGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACTGCT 2440 

2441 CCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCATCA 2480 

• • • • 
2481 CTTCTCCTTGGAC ATCGATGTGGGATGTACTGACCTGAAT 2520 

• • • • 
2521 GAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGACCC 2560 

. . • • 

25 61 AAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCTCGA 2 600 

• • • • 

2601 AGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTGAAG 2 640 
2641 AGAGC AGAGAAGAAGTGGAGGGACAAACGTG AGAAACTCG 2680 

• » • • 

2681 AATGGGAAACT AACATCGTrrAC AAGGAGGCCAAAGAGTC 2720 
2721 CGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAGTTG 2760 

27 61 CAAGCCGACACCAACATCGCCATGATCCACGCCGCAGACA 2800 

• • • • 

2801 AACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGAGTT 2840 
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2841 GTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAGGAA 

. • • • 

2831 CTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACGATG 

. • • • 

2921 CCAGAAACGTCATCAAGAACGGTGACTTCAACAATGGCCT 

2961 CAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAGGAA 

• • • * 
3001 - CAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGTGGG 

• • • • 
3041 AAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGGTAG 

• • • 

3081 AGGCT AC ATTCTCCGTGTG ACCGCTT AC AAGG AGGG ATAC 

3121 GGTGAGGGTTGCGTGACCATCCACGAGATCGAGAACAACA 

• • • 
3161 CCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGAAAT 

• • • 

3201 CTATCCCAACAACACCGTTACTTGCAACGACTACACTGTG 

3241 AATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTAACA 
3281 G AGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACT ATGC 
3321 CTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGACGT 

3361 GAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACTACA 

. • • • 

3401 CACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGAGTA 

3441 CTTTCCTGAGACCGACAAAGTGTGG ATCGAG ATCGGTG AA 
3481 ACCGAGGGAACCTTC ATCGTGGAC AGCGTGGAGCTTCTCT 
3521 TGATGGAGGAA 3531. 
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27. A method of Claim 4 further comprising removal of sequences comprising more than five 
consecutive A + T or G + C bases. 

28. A structural gene sequence of Claim 13 comprising a majority of plant preferred codons. 

29. A structural gene encoding the coat protein of potato leaf roll virus, said gene having the sequence: 

1 ATGAGTACTGTCGTGGTT AAGGGAAACGTGAACGGTGGTG 4 0 

4 1 TTCAACAACCTAGAAGGAGAAGAAGGCAATCCCTTCGTAG 80 

• • • * 

81 GAGAGCTAACAGAGTTCAGCCAGTGGTTATGGTCACTGCT 120 

• • • • 

121 CCTGGGCAACCAAGAAGGAGAAGAAGGAGAAGAGGAGGTA 160 

• • • • 

161 ATCGCAG ATC AAGAAGAACTGGAGTTCCCAGAGGAAGAGG 200 

201 TTCAAGCGAGACATTCGTGTTTAC AAAGGAC AACCTCGTG 240 

• • • • 

241 GGCAACTCCCAAGGAAGTTTCACCTTCGGACC AAGTGTTT 280 

• • • * 

281 CAGACTGTCCAGCATTCAAGGATGGAATACTC AAGGCTTA 320 

« • « • 

321 CC ATGAGTACAAG ATCAC AAGTATCTTGCTTCAGTTCGTC 360 

• . • • • 

361 AGCGAGGCCTCTTCCACCTCTCCAGGCTCCATCGCTTATG 400 

• ■ • • 

401 AGTTAGATCCACATTGCAAAGTTTCATCCCTCCAGTCCTA 440 

• • • • 

441 CGTCAACAAGTTCCAAATCACAAAGGGTGGTGCTAAGACC* 480 

• • • " 

481 T ATC AAGCTCGTATGATCAACGGAGTTGAATGGCACGATT 520 

521 CTTCTGAGGATCAGTGCAGAATCCTTTGGAAAGGAAATGG 5 60 

• • • • 

561 AAAGTCTTCAGATCCAGCTGGATCTTTCAG AGTTACCATC 600 

601 AGAGTTGCTCTTCAAAACCCAAAG 624. 



30. A chimeric plant gene which comprises a stmctural coding sequence encoding an Insecticidal 
protein of Bacillus thuringiensis, said structural coding sequence being modified to reduce the number of 
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putative polyadenylaiion signals within said staicturaJ cocfing sequence. 

31 . A chimeric plant gen© of Claim 30 in which the polyadenylation signals are selected from the group 
consisting of AATAAA, AATAAT. AACCAA. ATATAA. AATCAA. ATACTA. ATAAAA, ATGAAA, AAGCAT. 
ATTAAT, ATACAT, AAAATA. ATTAAA. AATTAA, AATACA and CATAAA. 

32. A chimeric plant gene of Claim 31 in which said structural coding sequence is further modified to 
reduce the number of ATTTA sequences within said structural coding sequence. 

33. A chimeric plant gene of Claim 32 in which said structural coding sequence is substantially devoid 
of polyadenylation signals and ATTTA sequences. 

34. A transformed plant cell containing a gene of Claim 33. 

35. A transformed plant cell of Claim 34 selected from the group consisting of soybean, cotton, alfalfa, 
oilseed rape, flax, tomato, sugarbeet, sunflower, potato, tobacco, maize, rice and wheat. 

36. A plant comprising transformed plant cells of Claim 34. 

37. A plant of Claim 36 which comprises plant cells of Claim 35. 

38. A seed produced by a plant of Claim 36. 
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1 atggctatag*^actggttaca.ccc:caat::gatatttcct 4 0 



• • * • 

4 1 TGTCGCTAACGC AATTTCTTTTGAGTGAATTTGTTCCCGG 8 0 



a 1 TGCTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGA 120 

T C 

• • • • 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 



• » * • 

161 TTGAAC AGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 200 
C C C G C G 

• • • • 

201 GAACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 240 
T 

• • • • 

241 TATCAAATTTACGCAGAATCTTTT AGAG AGTGGGAAGCAG 280 



• • • • 

281 ATCCT ACTAATCC AGCATTAAG AGAAGAG ATGCGTATTC A 320 



• t • • 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 3 60 



• • • • 

361 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG 400 

CC C C 

• • - • • 

401 TATATGTTCAAGCTGCJ^AATTTACATTTATCAGTTTTGAG 440 
G C C CC C CC C 

. • • • 

441 AGATGTTTCAGTGTTTGGACAAAGGTGGGG ATTTGATGCC 480 



♦ ♦ • • 

481 GCGACT ATCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 



• • • • 

521 TTGGCAACTATACAGATC ATGCTGTACGCTGGTACAATAC 560 

• • • • 

5 61 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGG 600 

• • * * 

601 ATAAGATATAATCAATTT AGAAGAG AATTAAC ACT AACTG 640 
CGCCGC GCT 

• • • • 

641 TATTAGATATCGTTTCTCT ATTTCCGAACTATGAT AGTAG 680 

• • • ' 

681 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 



FIGDKE 2A 
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721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 760 

• • • • 

761 TTCGAGGCTCGGCTCAGGGC ATAGAAGGAAGT ATT AGGAG 800 

• • • • 

801 TCCACATTTGATGGATATACTTAATAGTATAACCATCTAT 840 

• • • • 

841 ACGGATGCTC AT AGAGGAGAATATTATTGGTC AGGGCATC 880 

C C C T C 

• • • • 

881 AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 
G C 

. • • • 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 960 

• # • 

961 C AACAACdTATTGTTGCTCAACT AGGTCAGGGCGTGT ATA 1000 

. • • • 

1001 GAAC ATTATCGTCCACCTT ATATAGAAGACCTTTTIU^TAT 1040 

C 

• • • • 

1041 AGGGATAAATAATCAACAACTATCTGTTCTTGACGGGACA 1080 

C C C C 

• • • • 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 

• • • • ■ 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

• • - • 

1201 AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 1240 

1241 TTAGT AATAGTAGTGTAAGT ATAATAAGAGCTCCT ATGTT 1280 

1281 CTCTTGGATACATCGTAGTGCTGAATTTAATAAT ATAATT 1320 

G C C C C C 

• • • • 

1321 CCTTCATCACAAATTACACAAATACCTTTAACAAAATCTA 1360 

C C C AC C C G 

• • • * ' 
1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 

FIGURE 2B 
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1401 ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 1440 

1441 C AGATTTCAACCTTAAGAGTAAATATTACTGC ACC ATTAT 1480 

• • • • 

1481 C ACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

• • • ♦ 

1521 AAATTTACAATTCCATACATCAATTGACGGAAGACCTATT 15 60 

CC T G C 

1561 AATCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTA 1600 

1 601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640 

• • • 

1 641 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

» • • • 

1 681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 

1721 ATCGAATTGAATTTGTTCCGGCA 1743 
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• • • • 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

• « • • • 
4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 8 0 

C C G A T C T ' 

• « • • 

8 1 AAGAATAG AAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

• • • • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • • 

161 CTGGATTTGTGTTAGGACTAGTTGAT ATAATATGGGGAAT 200 
GCTCC CCC T 

• • • • 

201 TTTTGGTCCCTCTCAATGGG ACGCATTTCTTGTACAAATT 240 
C A T C G G 

• • • • 

241 GAACAGTTAATTAACCAAAG AATAGAAGAATTCGCT AGGA 280 
G GC GGC G C 

• . • • 

281 ACCAAGCC ATTTCT AGATTAGAAGGACTAAGC AATCTTTA 320 
G C G G T G C 

• » • • 

321 TC AAATTT ACGCAGAATCTTTTAGAGAGTGGG AAGC AG AT 3 60 
C C T GAGC C C 

» • • • 

361 CCTACTAATCC AGC ATTAAGAGAAG AG ATGCGTATTC AAT 400 
C TC CC C G A 

. - - » 

401 . TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 
C CTGCA CAT 

. • • • 

441 TTTTGCAGTTC AAAATT ATC AAGTTCCTCTTTTATCAGTA 480 
GC CGCC CGCG 

• • ' * 

481 TATGTTCAAGCTGC AAATTT AC ATTTATCAGTTTTGAG AG 520 
C A T C T CC CAGC GC TC 

• • • 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

• 

561 6ACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

■ > • * 

601 GGCAACTATACAGATcATGCTGTaCGCTGGTACAATACGG 640 
A C.CCCC TT CT 

641 GATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGGAT 680 
C G C T T 



FIGURE 3A 
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68 1 AAGATATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
T CCGCG GCCAT 

• • • • 

721 TTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAGAA 7 60 
G C T G C C CTCC 

• • • • 

7 61 CGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CCTCT G CTC 

m . * • • 

801 TTATACAAACCC AGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TCTGCCC CC 

• • • • 

841 CGAGGCTCGGCTCAGGGCAT AGAAGGAAGTATTAGGAGTC 880 
T T T C A T C CTCC C C 

• • • • 

881 C ACATTTG ATGG AT AT ACTT AATAGT AT AACC ATCT AT AC 920 
C CCTGCC T C 

• • • * 

921 GGATGCTC ATAG AGGAGAAT ATTATTGGTC AGGGC ATCAA 960 
C C G C TACG 

• - • • • 

961 AT AATGGCTTCTCCTGTAGGGTTTTC GGGGCC AG AATTCA 1000 
C C ^ A T A CAGC C G T 

1001 CTTTTCCGCTAT ATGGAACT ATGGG AAATGCAGCTCC AC A 1040 
CTC C C 

. ♦ • • 

1041 ACAACGT ATTGTTGCTC AACTAGGTC AGGGCGTGT AT AGA 1080 
C T C C 

• . * ♦ • ■ 

1081 AC ATTATCGTCCACCTTATATAGAAGACCTTTTAATATAG 1120 
CGT GC CC C 

• • • • 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

• ■ • * 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

• • * * 

1201 TACAGAAAAAGCGGAACGGT AGATTCGCTGGATGAAAT AC 1240 
G C T CT C C 

• • • • 

1241 CGCC ACAG AATAAC AACGTGCC ACCT AGGC AAGGATTT AG 1280 
A C T C CTC 

> • • * 

1281 TC ATCGATTAAGCC ATGTTTCAATGTTTCGTTC AGGCTTT 1320 
CCAGG CGC C CA C 

. • • * 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 13 60 

C C TCC G C C C 

. • • ■ 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTCC 1400 
AT G C C C 
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• • • • 

1401 TTCATCACAAATTACACAAATACCTTTAACAAAATCT ACT 1440 
CT CC CAG .CG 

• • • • 

1441 AATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGGAT 1480 
C A G C 

• • • • 

1481 TTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGCCA 1520 
C T A T 

« • • • 

1521 GATTTCAACCTTAAGAGTAAATATTACTGCACCATTATCA 1560 
AGC CC TCC CTT 

• • • • 

1561 CAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCACAA 1600 . 

T C G T A A 

• • • • 

1601 ATTTACAATTCCATACATC AATTGACGGAAGACCTATTAA 1640 
CG* CCCC G C 

• • * • 

1641 TCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTAAT 1680 
T C C C C TCA CCCC 

t • • • ' 

1681 TTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTACTC 1720 
GA C CACC C 

• • • • 

1721 CGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTAAG 17 60 
TC CTC CTCCCT 

. . • • 

17 61 TGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAGAT 1800 
C G T G C T C 

• • • • 

1801 CGAATTGAATTTGTTCCGGCAGAAGTAACCTTTGAGGCAG 1840 
T G GTC T C T 

1841 AATAT 1845 
G C 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
CCA C AC 

• • • • • 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGT ATTAGGTGGAGA 8 0 
C C G AT C T 

8 1 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

• • • " 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • ' 

161 CTGGATTTGTGTT AGGACTAGTTGAT ATAATATGGGG AAT 200 

GCTCC CCC T 

• • • • 

201 TTTTGGTCCCTCTCi^TGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

. • • • 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 

G GC GGC G C 

• • • " 

281 ACCAAGCC ATTTCT AGATTAGAAGG ACTAAGC AATCTTT A 320 
G C G G T G C 

• • • * 

321 TC AAATTT ACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 3 60 

C C T GAGC C C 

• ♦ • • 

361 CCTACTAATCC AGC ATTAAG AGAAGAGATGCGTATTC AAT 400 

C TC CC C G A 

401 TCAATGAC ATGAAC AGTGCCCTTACAACCGCT ATTCCTCT 440 
C CTGCA CAT 

• • • • * 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 4 80 
GC CGCC CGCG 

• • • ' 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

• • • ' 

52 1 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

• • • * 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

• 

60 1 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 
A CCCCC TT CT 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 



FIGURE 4A 
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• • • • 

58 1 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
TACCGCG GCCAT 

• • • " 

721 TTAGATATCGTTGCTCTGTTCCCGAiVTT ATGATAGTAGAA 7 60 
G C T GT C C CTCC 

• • • • 

761 GATATCCAATTCGAAC AGTTTCCCAATTAACAAGAGAAAT 800 
CCCTCT G CTC 

• • • • 

801 TTATAC AAACCCAGT ATTAG AAAATTTTGATGGTAGTTTT 840 
C T TCTGCCC CC 

• • • • 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
TTTCATC G CTCC C C 

• • • • 

881 CACATTTG ATGGATATACTT AACAGTATAACC ATCTATAC 920 
C C CT G C T C 

• • • • 

921 GGATGCTCATAGGGGTTATT ATTATTGGTC AGGGCATCAA 960 
C CAAGG C TACG 

• • • • 

961 ATAATGGCTTCTCCTGTAGGGT.TTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

• • • • 

1 0.0 1 CTTTTCCGCTATATGGAACT ATGGGAAATGC AGCTCC AC A 1040 

CTC C C 

• • • • 

104 1 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C . T C C 

» • • ^ • 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 
CGT CGC CC C 

• • • • 

1121 GG ATAAAT AATCAACAACT ATCTGTTCTTGAC GGGAC AG A 1160 
TCCCGTC A 

• • • • 

1161 ATTTGCTTATGGAACCTCCTC AAATTTGCC ATCCGCTGT A 1200 
G C C T T C T 

• • • * 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 
G C T CT C C 

• • • • 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

f - • • 

1281 TC ATCGATTAAGCC ATGTTTCAATGTTTCGTTC AGGCTTT 1320 
CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 
C C TCC G C C C 

• • • • ' 
1361 CTTGGATACATCGT AGTGCTGAATTT AATAAT AT AATTGC 1400 

C G C C C C C 
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1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 
C 

■ « • • 

1441 TTTCTTTTTAATGGTICTGTAATTTCAGGACC AGGATTTA 1480 
C C C C C 

• • • • 

1481 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 15 60 



• • • • 

1561 CC ATCGAC ATCT ACCAGATATCGAGTTCGTGT ACGGTATG 1600 
C A GA 

• * • • 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 1640 
G T 

• • • • 

1641 TTCATCCATTTTTTCCAATAC AGTACCAGCTACAGCTACG 1680 
C C T C 

• • • • 

1681 TC ATTAGAT AATCT AC AATC AAGTGATTTTGGTTATTTTG 1720 
C G C C C C . C 

• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 1760 

C C C C 

• • • • 

1761 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

• > • • 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 1840 
C G C 

• • • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

A TGCG 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 
CTGT ACGTCTACA C AGCT G ACTC G CA TG 

1921 G 1921 



FIGX7RE 4C 



EP 0 385 962 A1 



§ 

to 
o 




to PO Q <N 

O O O 0> 00 

K) ro r> CN CN 

CM CN CM CN CN 
CM 

— o 3 q: c 

^ w o ^ 

CD X CO U 



EP0 3B5 962A1 




EP 0 385 962 A1 




EP 0 385 962 A1 



• • • • 

1 GAAAGAATAGAAACTGGTTACACCCCAATCGATATTTCCT 4 0 
ATGGCC T C T C C C 

• • • • 

41 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 80 
CTGAG GCCCGCGA 

« • > • 

8 1 TGCTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGA 120 
GCTCC CCC T 

• • • • 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 1 60 
C A T C G G 

• • • • 

161 TTGAAC AGTTAATT AACC AAAGAATAGAAG AATTCGCTAG 200 
G G-C GGC G C 

• • • * 

201 GAACCAAGCCATTTCTAG ATTAGAAGGACT AAGCAATCTT 240 
G C G G T G C 

• • • ' • 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 280 
C C T GAGC C C 

• • • 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 
C TC CC C G A 

• • • * 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 360 
C CTGCA CA 

. * • • 

361 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG 400 
TGC CGCC CGC 

• • • 

401 TATATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAG 440 
G C A T C T CC CAGC GC TC 

• • • • 

441 AGATGTTTC AGTGTTTGG ACAAAGGTGGGGATTTG ATGCC 480 
-C AGC G C T 

• • • * 

481 GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 

AC C CCCCT G 

• • » ■ 

52 1 TTGGCAACTATACAGATT ATGCTGTACGCTGGTACAATAC 560 
A CCCCC TT C 

• • • • 

561 GGGATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGG 600 
T C G G C T T 

• • • • 

60 1 GTAAGGTATAATC AATTT AGAAG AGAATTAACACTAACTG 640 
ATACCGCG GCCA 

• • • • 

641 TATTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAG 680 
T G C T GT C C CTCC 



fzgubje: 8a 
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« • • * 

681 AAGATATCC2y^TTCGAAC AGTTTCCCAATTAACAAGAGAA 720 
CCCTCT G CTC 

. • • • 

721 ATTT AT ACAAACCC AGTATT AGAAAATTTTG ATGGTAGTT 760 
C T TCTGCCC C 

• • • ' • 

761 TTCGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAG 800 
CTTTCATC G CTCC C 

. • • • 

801 TCCACATTTGATGGATATACTTAACAGTAT AACCATCTAT 840 
C C CCTG C T C 

• • • • 

841 ACGGATGCTC ATAGGGGTTATTATTATTGGTC AGGGCATC 880 
C CAAGG C TAG 

• • • • 

881 AAATAATGGCTTCTCCTGT AGGGTTTTCGGGGCC AGAATT 920 
G C C ATA CAGC C G 

• • • • 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 9 60 
T C T C C C 

• • ■ * 

- 961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

C . T C C 

• • • • 

1001 GAAC ATTATCGTCCACTTTATATAG AAGACCTTTTAATAT 1040 
CGT CGC CC 

• • • • 

1041 AGGGAT AAATAATCAACAACTATCTGTTCTTGACGGGAC A 1080 
CTCCCG TC A 

• a • • 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
G C C T T C 

• • • • 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 
T G C T CT C 

1161 ACCGCC AC AGAATAAC AACGTGCC ACCTAGGC AAGG ATTT 1200 
C A C T C C 

• • • • 

1201 AGTC ATCGATTAAGCC ATGTTTCAATGTTTCGTTC AGGCT 124 0 
TCC CAGG CGC C CA 

• • • * 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 
C C C TCC G C C C 

1281 CTCTTGGATACATCGTAGTGCTGAATTTAATAATATAATT 1320 

C G C C C C C 

• • • * 

1321 GCATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 13 60 
C 

• » • • 

1361 ACTTTCTTTTT AATGGTTCTGTAATTTCAGGACCAGGATT 1400 
C C C C 



FIGURE 8B 
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1401 TACTGGTQGGGACTTAGTTAG ATT AAAT AGTAGTGGAAAT 1440 
C ACC CCCC 

• " • • • 

1441 AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 1480 



1481 TCCCATCGACATCTACCAGATATCGAGTTCGTGTACGGTA 1520 
C A GA 

• • • • 

1521 TGCTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGT 1560 

G T 

• • • • 

1561 AATTC ATCC ATTTTTTCCAAT ACAGTACCAGCTACAGCTA 1600 

C C T 

• • • • 

1601 CGTCATTAGATAATCTACAATCAAGTGATTTTGGTT ATTT 1640 
CCG C CC C C 

• • • ■ • 

1641 TGAAAGTGCC AATGCTTTTACATCTTCATTAGGTAATATA 1680 

C C C C 

• • • • 

1681 GTAGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAA 1720 
G C T 

. t • • 

1721 TAGACAGATTTGAATTTATTCCAGTTACTGCAACACTCGA 1750 
C C G C 

1761 GGCTGAA 1767 
G 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

. • • • 

4 1 ATTGTTTAAGT AACCCTGAAGTAGAAGTATTAGGTGGAGA 8 0 

C C G A T C T 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

. . • • 

12 1 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • * 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
GCTCC CCC T 

• • « 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

» • • • 

241 GAACAGTT AATTAACCJy^GAATAG AAGAATTCGCTAGGA 280 
G GC GGC.G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

• • • 

321 TC AAATTT ACGCAGAATCTTTTAGAGAGTGGGAAGC AGAT 360 
C C T GAGC C C 

. • • • 

361 CCTACTAATCCAGCATT AAGAGAAGAGATGCGTATTCAAT 400 
C TC CC C G A 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 
C CTGCA CAT 

. • • * 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 
GC CGCC CGCG 

. • • • 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

. • * * 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

• • • • 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

601 GGC AACT ATAC AGATTATGCTGT ACGCTGGT ACAAT ACGG 640 
A CCCCC TT CT 

641 GATTAG AACGTGT ATGGGG ACCGG ATTCTAGAGATTGGGT 680 
C G G C T T A 

FIGUPi: 9A 
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• • • • 

681 AAGGT ATAATC AATTTAG AAG AG AATT AAC ACTAACTGT A 720 
TACCGCG GC CAT 

* • • • 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

. • • • 

7 61 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 

CCCTCT G CTC 

. • • • 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TCTGCCC CC 

• 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGT ATTAGGAGTC 880 
TTTCATC G CTCC C C 

. • • • 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 

C C CT G C T *C 

. • » • 

92 1 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 

C CAAGG C TACG 

961 ATAATGGCTTCTCCTGT AGGGTTTTCGGGGCCAGAATTC A 1000 
C- C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCC ACA 1040 
CTC C C 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

. • • • 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 
CGT CGC CC C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1150 
TCCCG TC A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

• • • • • 
1201 TACAGAAAAAG CGGAACGGT AGATTCGCTGG ATG AAAT AC 1240 

G C T CT C C 

• • • • 

1241 CGCCACAGAATAACAACGTGCC ACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

• • • • 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 
CCAGG CGC C CAC 

. • • ♦ 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCT ATGTTCT 1360 

C C TCC G C C C 

13 61 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C G C C C C C 
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1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 
C 

• • • • 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 1480 
C C C C C 

• • • 

1481 * CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 

A C C C C C C 

1521 CATTC AGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 



1561 CC ATCGAC ATCTACCAG ATATCG AGTTCGTGT ACGGTATG 1600 
C A GA 

• • • 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 1640 
G . T 

• • • 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCT ACG 1680 
C C T C 

1681 TC ATT AGATAATCTACAATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

. • • • 

1721 AAAGTGCCAATGCTTTTAC ATCTTCATTAGGTAATATAGT 17 60 

C C C C 

. • • • 

1761 AGGTGTTAGAAATTTTAGTGGGACTGC AGGAGTGATAATA 1800 
G C T C 

• • • • 

1801 GACAGATTTGAATTTATTCCAGTTACTGC AACACTCGAGG 184 0 

C G C . 
. • • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTG AATGC 1880 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

1921 GTAACGG ATTATC ATATTG ATCAAGTGTCCAATTT AGTT A 1960 

• • • ' 

1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 
2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 
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• • « • 

2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 

• • • • 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

• - • 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

2241 TACCCGTTATC AATTAAGAGGGT ATATCGAAGATAGTCAA 2280 

• • • 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320 

. • • • 

2321 AAACAGTAAATGTGCCAGGTACPGGTTCCTTATGGCCGCT 2360 

• - • • • 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

2401 CGATGCGCGCC ACACCTTGAATGGAATCCTGACTTAGATT 2440 
2441 GTTCGTGTAGGG ATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

2481 TCATTTCTCCTTAGAC ATTGATGTAGGATGTACAGACTTA 2520 

• • • • 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

. • • • 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 264 0 

• • • » • 
2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 

2681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 

. • • " 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

> ■ • • 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

FIGURE 9D 
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• • • » 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • • « 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

• • • • 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAAT AATGG 2960 

2961 CTTATCCTGCTGGAACGTGAAAGGGC ATGT AG ATGTAGAA 3000 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 304 0 

• • • • 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

• • • » 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

• • • • 

3201 * AATCT ATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 

4 • • • 

3241 GTAAATC AAG AAG AAT ACGG AGGTGCGTAC ACTTCTCGTA 3280 

• • • • 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 TGCGTC AGTCTATGAAG AAAAAT CGTATAC AGATGG ACGA 3360 

• » * * 

3361 AGAGAGAATCCTTGTG AATTTAAC AGAGGGTATAGGGATT 3400 

• • • • 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

• • • * 

3441 ATACTTCCCAG AAACCGATAAGGTATGGATTGAGATTGGA 3480 

3481 GAAACGGAAGGAACATTTATCGTGGAC AGCGTGGAATTAC 3520 
3521 TCCTTATGGAGGAA 3534 

FIGURE 9E 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
CCA C AC 

• 

4 1 ATTGTTTAAGT AACCCTGAAGTAGAAGTATTAGGTGGAGA 8 0 
C C G A T C T 

• 

8 1 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
GOT C TC CC 

. • • • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 1 50 
CTGAG GCCCGCGA 

• • • • 

151 CTGGATTTGTGTT AGGACT AGTTGATATAATA^GGGGAAT 200 
GCTCC CCC T 

- * 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTAC AAATT 240 
C A T • C . G G 

241 GAAC AGTT AATTAACCAAAGAAT AGAAGAATTCGCT AGGA 280 
G GC GGC G C 

* • . • 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTT A 320 
G C G G T G C 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C -C 

361 CCTACTAATCCAGC ATT AAGAGAAGAGATGCGTATTCAAT 400 
C TC CC C G A 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 
C CTGCA CAT 

* 

441 TTTTGCAGTTC AAAATT ATCAAGTTCCTCTTTT ATCAGTA 480 
GC CGCC CGCG 

481 TATGTTC AAGCTGCAAATTTAC ATTTATC AGTTTTGAG AG 520 
C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

561 GACTATCAATAGTCGTT ATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

. • • ' 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 
A CCCCC TT CT 
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641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 

• • • • 

681 AAGGT ATAATCAATTTAGAAGAGAATTAAC ACTAACTGTA 720 
TACCGCG GCCAT 

• • • " 

721 TTAGAT ATCGTTGCTCTGTTCCCGAATTATGATAGTAG AA 760 
G C T GT C C CTCC 

• • • . • 

761 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CCCTCT G CTC 

• • • • 

801 TTATAC AAACCC AGTATTAGAAAATTTTG ATGGTAGTTTT 840 
C T TCTGCCC.CC 

• • • • 

841 . CGAGGCTCGGCTC AGGGCATAGAAAGAAGTATTAGGAGTC 880 
TTTCATC G CTCC C C 

• • • • * 
881 CACATTTGATGGAT ATACTTAAC AGTATAACC ATCTATAC 920 

C C CT G C T C 

• • • • 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 
C CAAGG C TACG 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

• • • • 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
CTC CO 

• • • • 

1041 ACAACGTATTGTTGCTC AACTAGGTCAGGGCGTGT ATAG A 1080 
C T C C 

• • • * 

1081 AC ATT ATCGTCC ACTTT ATATAG AAGACCTTTTAATAT AG 1120 
CGT CGC CC C 

• • • * 

1121 GG ATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

4 

1151 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

. • • • 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

« • • • 

1281 TCATCGATTAAGCCATGTTTC AATGTTTCGTTC AGGCTTT 1320 
CCAGG CGC C CAC 

• • • • ■ 
1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 

C C TCC G C C C 
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1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C G C C C C C 

• • • • 

1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 
C 

• • • • 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 1480 
C C C C C 

• • • • 

1481 CTGGTGGGGACTTAGTT AG ATTAAATAGT AGTGGAAATAA 1520 
A C C C C C C 

• • • • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACnC 1560 



1561 CCATCGACATCTACC AG ATATCG AGTTCGTGTACGGT ATG 1600 
C * A GA 

• • • • 

1501 CTTCTGTAACCCCGATTCACCTC AACGTTAATTGGGGTAA 1640 
G T 

• • * • 

1641- TTC ATCC ATTTTTTCC AAT AC AGTACC AGCT AC AGCT ACG 1680 
C C T C 

• • • • 

1681 TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

• • • ■ 

1721 AAAGTGCCAATGCTTTT ACATCTTCATTAGGTAAT ATAGT 17 60 

C C C C 

• • " • 

1761 AGGTGTTAGAAATTTTAGTGGGACTGC AGG AGTG ATAATA 1800 
G C T C 

• • • • 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 184 0 
C G C 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 



1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

G C C C G C 

• • • • 

1 92 1 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 
G C G G 

• • - • 

1961 CGTATTTATCGGATGAATTTTGTCTGGATG AAAAGCGAG A 2000 
C CC CAGC G C 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

• • • * 

2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
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2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 

2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 
G TC GCGGC 

• • • » 

2161 GTC ACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

• • • • 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 
CCC-CGG CGCGG 

2241 TACCqGTTAJCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 

• • • ♦ 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320 
C C G CC C C 

• • > • 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

• • • > 

23 61 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

• • » ■ 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 244 0 

• • « • 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

• • • • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

2521 AATGAGGACCTAGGTGT ATGGGTGATCTTTAAGATTAAG A 2560 

t * • • 

2561 CGCAAG ATGGGC ACGC AAGACT AGGGAATCTAGAGTTTCT 2600 

• • • • 

2601 CGAAG AGAAACC ATT AGTAGGAGAAGCGCT AGCTCGTGTG 2 640 

mm** 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 

G G 

» • • • 

2681 TGGAATGGGAAAC AAAT ATCGTTTATAAAGAGGCAAAAGA 2720 
G C C C C 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 

• • • • 

27 61 TTACAAGCGGAT ACGAATATTGCCATGATTCATGCGGCAG 2800 
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2801 ATAAACGTGTTCATAGC ATTCGAGAAGCTTATCTGCCTGA 2840 

• • • • 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • • • 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCT ATATG 2920 

C C 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 
C CGC CCC 

2961 CTT ATCCTGCTGGAACGTGAAAGGGC ATGTAGATGTAGAA 3000 
3001 GAAC AAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 
3041 GGGAAGC AGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

. • • ♦ 

3121 TATGGAGAAGGTTGCGTAACCATTC ATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

• • • * 

3201 AATCTATCCAAATAAC ACGGTAACGTGTAATGATTATACT 324 0 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

« • " • 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

• . . • • 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 33 60 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGAXT 3400 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 34 40 

34 41 ATACTTCCCAGAAACCGAT AAGGTATGGATTGAGATTGGA 3480 

3481 GAAACGGAAGG AACATTTATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534 

FIGURE lOE 
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• • • • 

1 ATGSATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
CCA C AC 

• ■ • • 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 8 0 
C C G A T C T 

a 1 AAGAAT AGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC' CC 

• • • • 

12 1 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CT G A G GC C C G C G A 

• • • • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT- 200 
GCTCC CCC T 

• • « • 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

• • • • 

241 GAAC AGTTAATT AACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGC AATCTTTA 320 
G C G G T G C 

321 TCAAATTTACGC AGAATCTTTTAG AGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT * 400 
C TC CC C G* A 

401 TCAATG ACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 
C CTGCA CAT 

4 41 TTTTGC AGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 
GC CGCC CGCG 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 ' 
C A T C T CC CAGC GC TC 



52 1 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C ' AGC G C T 

• • • • 

5 61 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

• • • • 

601 GGCAACTATAC AGATTATGCTGTACGCTGGT ACAAT ACGG 640 
A CCCCC TT CT 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 
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• • • • 

681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
TACCGCG GCCAT 

• ■ ■ • 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

« • • • 

761 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGA2^T 800 
CCCTCT G CTC 

• • • • 

80 1 TTATAC AAACCCAGTATT AGAAAATTTTGATGGTAGTTTT 840 
C T TCTGCCC CC 

• • • • 

841 CGAGGCTCGGCTC AGGGC ATAGAAAGAAGT ATTAGG AGTC 880 
TTTCATC G CTCC C C 

881 CACATTTGATGG AT ATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T C 

« • • • 

921 GGATGCTC ATAGGGGTTATT ATTATTGGTC AGGGCATCAA 960 
C CAAGG C TACG 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

• • • • 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
CTC C C 

• • • • 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGT ATAG A 1080 
C T C C 

• . • ■ • 

1081 ACATTATCGTCC ACTTTATATAGAAGACCTTTTAAT ATAG 1120 
CGT C GC CC C 

• « • • 

1121 GGATAAATAATC AACAACTATCTGTTCTTG ACGGGACAGA 1160 
TCCCG TC A 

» • • • 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 12 40 
G C T CT C C 

• • • • 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

• • • • 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 13 2 0 
CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 
G C TCC G C C C 

• • • • 

1361 CTTGGATAC ATCGT AGTGCTGAATTTAATAATATAATTGC 1400 
C G C C- C C C 
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• • • • • 

1401 ATCGGATAGTATTACTC AAATCCCTGC AGTGAAGGGAAAC 1440 
C 

• • • • 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACC AGGATTTA 1480 

c c c c c 

• • • • 

1481 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

• • • • 

1521 CATTCAGAAT AGAGGGTATATTGAAGTTCC AATTC ACTTC 1560 



• • • • 

1561 CCATCGACATCT ACCAGATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

• • • • 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 1640 
G T . 

• • • • 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

• • • • 

1681 TC ATTAGATAATCTAC AATCAAGTGATTTTGGTT ATTTTG 1720 
C G C C C C C 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAAT AT AGT 17 60 

C C C C 

• • • • 

1761 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

• • • • 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 1840 
C G C 

■ ■ • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 
GCCTG C T C 

1881 GCTGTTTACGTCT AC AAACC AACT AGGGCT AAAAAC AAAT 1920 
CC CCCTGTCTG TC 

• • • • 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 
TTC C C CGC 

• • • • 

1961 CGTATTTATCGGATGAATTTTGTCTGGATG AAAAGCGAGA 2000 
C CC TAGC G C C C C G T 

• • • • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCG ACTCAGTGAT 2040 
CC T CC T CC 

. • • • 

2041 GAACGC AATTT ACTCCAAGATTC AAATTTC AAAGAC ATTA 2080 
GA GCCTG CCC C 

• • • • 

2081 ATAGGC AACCAGAACGTGGGTGGGGCGGAAGTAC AGGGAT 2120 
C G T T C C 
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2121 TACCATCCAAGGA6GGGATGACGTATTTAAAGAAAATTAC 2160 
C CC TGCGGC 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 
CCCATCC C.TC 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 
C CGG GCCC 

» « • • 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 
CAG CT CC CC 

* « • • 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGC AAAACATG 2320 
CT CCGCAG CGC 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 
GCG C T CC A 

2361 TTC AGCCC AAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 
T TC C T G T C 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 
AT G G C 

2441 GTTCGTGTAGGGATGGAG AAAAGTGTGCCC ATCATTGGC A 2480 
CGC C G C T 

2481 TCATTTCTCCTT AGACATTGATGT AGGATGTAC AGACTTA 2520 
C G C G T C G 

2521 AATGAGGACCT AGGTGTATGGGTG ATCTTTAAGATTAAG A 2560 

C A C C C C 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2 600 
C C A T C C T 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

GOT T C 

2 641* AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 
G A G G G G C 

2 581 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 
C T C CGC 

♦ • • • 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 
GCG GCGC G 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 
G CCCCC CCC 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 
C G C T G CT 
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2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 
T C CT GCTCCCG 

. • • • 

2881 GAATT AGAAGGGCGT ATTTTC ACTGCATTCTCCCTATATG 2920 
CTGA CTC TGC 

. • • • 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 
C C CGC CCC 

2961 CTTATCCTGCTGG AACGTGAAAGGGCATGT AGATGT AGAA 3000 
C CAG T T G C G G 

• • • * 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 
G TG C G GTG 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG ■ 3080 
T C G A A A 

• • • • 

3081 TCGTGGCTATATCCTTCGTGTCAC AGCGTACAAGGAGGGA 3120 
AA CTC GCT 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 
C T G G C C 

. • • • 

3161 AT ACAG ACGAACTGAAGTTTAGCAACTGCGT AGAAGAGGA 3200 
C C G T CTC C G A 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 
CC CTTCCCC 

3241 GTAAATC AAGAAGAATACGGAGGTGCGTAC ACTTCTCGTA 3280 
G G G ' C AGC 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 
CA T C T T C 

3321 TGCGTC AGTCTATGAAGAAAAATCGTATAC AGATGG ACGA 3360 
CCGCGG C C CA 

• » • • 

3351 AGAGAGAATCCTTGTGAATTTAAC AGAGGGTATAGGG ATT 3400 
CT C CGC TC C 

3401 ACACGCC ACTACCAGTTGGTTATGTGAC AAAAGAATTAGA 3440 
A T - C TCGGCT 

3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 
G TTG CAG C CT 

3481 GAAACGGAAGGAACATTT ATCGTGGAC AGCGTGGAATTAC 3520 
C G C C GC T 

3521 TCCTTATGGAGGAA 3534 
T G 
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1 ATGACTGCAGATAATAATACGGAAGCACTAGATAGCTCTA 4 0 
CCCC CCCT 

4 1 CAACAAAAGATGTCATTC AAAAAGGC ATTTCCGTAGT AGG 8 0 
CTG TCGGTC TG 

• » • • 

81 TGATCTCCTAGGCGTAGTAGGTTTCCCGTTTGGTGGAGCG 120 
ACTG GTATCC C 

• ' » • • 

121 CTTGTTTCGTTTTATACAAACTTTTTAAATACTATTTGGC 160 
C GAGC C CCCC 

t t • • 

161 CAAGTGAAGACCCGTGGAAGGCTTTTATGGAACAAGTAGA 200 
CG T AAC G T 

• • • • • 

201 AGCATTGATGGATCAGAAAATAGCTGATTATGCAAAAAAT 240 
TCT GTA CGC 

• » • • 

241 AAAGCTCTTGCAGAGTTACAGGGCCTTCAAAATAATGTCG 280 
GTG ACC GC G 

281 AAG ATT ATGTGAGTGCATTGAGTTC ATGGC AAAAAAATCC 320 
G C C TCCAGC G G C 

321 TGTGAGTTCACGAAATCCACATAGCCAGGGGCGGATAAGA 360 
T C CA T C A TA C 

• • • • 

361 GAGCTGTTTTCTCAAGCAGAAAGTC ATTTTCGTAATTCAA 400 
T C C TCC C CA A C 

401 TGCCTTCGTTTGC AATTTCTGGAT ACGAGGTTCTATTTCT 440 
AGC T C C T T C 

441 AACAAC ATATGCACAAGCTGCC AAC AC ACATTT ATTTTTA 480 
CTC T CCGCC 

481 CTAAAAGACGCTC AAATTTATGGAGAAGAATGGGGATACG 520 
T G C G 

521 AAAAAGAAGATATTGCTGAATTTTATAAAAGACAACTAAA 560 
G GC GCCGCT T 

561 ACTTACGCAAGAATATACTGACCATTGTGTCAAATGGTAT 600 
G C C G C C G 

• » • • 

601 AATGTTGGATTAGATAAATTAAGAGGTTCATCTTATGAAT 640 
C TCC GCC CTCCG 

641 CTTGGGTAAACTTTAACCGTTATCGCAGAGAGATGACATT 680 
G C A A CA G C 
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681 AACAGT ATTAGATTTAATTGCACTATTTCC ATTGTATGAT 720 
GTGCCCTC C C C 

• • • • 

721 GTTCGGCTATACCCAAAAGAAGTTAAAACCGAATTAACAA 760 
GAAC G G TGCTC 

• • • • 

761 GAGACGTTTTAACAGATCCAATTGTCGGAGTCAACAACCT 800 
GC C T C T 

• • • • 

801 TAGGGGCTATGGAACAACCTTCTCTAATATAGAAAATTAT 840 
T T AGC C C C 

• • • • 

841 ATTCGAAAACCACATCTATTTGACTATCTGCATAGAATTC 880 
AG C C T C 

• • • • 

881 AATTTC ACACGCGGTTCC AACC AGGATATT ATGGAAATGA 920 
C AA T C T C 

■ • * • • 

921 CTCTTTCAATTATTGGTCCGGTAATTATGTTTCAACTAGA 960 
C C C C C 

• • • • 

961 CCAAGC AT AGGATCAAATGATATAATCAC ATCTCC ATTCT 1000 
T T C C C 

1001 ATGGAAATAAATCC AGTGAACCTGTACAAAATTTAG AATT 1040 
TCG GGCCTG 

• • • • 

1041 TAATGGAGAAAAAGTCTATAGAGCCGTAGCAAATACAAAT 1080 
C C C G C C C 

1081 CTTGCGGTCTGGCCGTCCGCTGT ATATTCAGGTGTT ACAA 1120 
CTG A AT.C CC* 

1121 AAGTGGAATTTAGCC AATATAATGATCAAAC AG?iTGAAGC 1160 
G G TG C GC G 

1161 AAGTAC ACAAACGTACGACTCAAAAAGAAATGTTGGCGCG 1200 
CCCGT CCTC A 

1201 GTCAGCTGGGATTCTATCGATCAATTGCCTCCAGAAACAA 1240 
TCT C C 

1241 CAGAT6AACCTCTAGAAAAGGGATATAGCCATCAACTCAA 1280 
C ATGG CC C T 

• • • • 

1281 TTATGT AATGTGCTTTTTAATGC AGGGT AGTAGAGG AAC A 1320 
C G C G A TCC G C 

• . » ■ • 

1321 ATCCCAGTGTTAACTTGGACACAT AAAAGTGTAGACTTTT 1350 
T G C C GTCC G C 

1361 TTAACATGATTGATTCGAAAAAAATTACACAACTTCCGTT 1400 
C C AGC G G C T C 
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1401 AGTAAAGGCATATAAGTTACAATCTGGTGCTTCCGTTGTC 1440 
G G A C C C G 

• • • • 

1441 GCAGGTCCTAGGTTTACAGGAGGAGATATCATTCAATGCA 1480 
CACT TC CG 

• • • • 

1481 CAGAAAATGGAAGTGCGGCAACTATTTACGTTACACCGGA 1520 
GCCCAT C G T 

• • . • 

1521 TGTGTCGTACTCTCAAAAATATCG AGCTAGAATTC ATT AT 1560 
T G G CA G AC T C 

• * • • 

1561 GCTTCTACATCTCAGATAACATTTACACTC AGTTTAGACG 1600 
A CAGC C C C C G T 

• • • • 

1601 GGGCACCATTTAATCAATACTATTTCGATAAAACGATAAA 1640 
A CCCGTCTCGCC 

• • • • 

1641 TAAAGGAGACACATTAACGTATAATTCATTTAATTTAGCA 1680 
C T TC C A C AGC C C G 

1681 AGTTTCAGCACACCATTCGAATTATCAGGGAATAACTTAC 1720 

T C C C C TC T 

. • • • 

1721 AAATAGGCGTCACAGGATTAAGTGCTGGAG ATAAAGTTT A 17 60 
GC CTCCCC C C 

1761 TATAGACAAAATTGAATTTATTCCAGTGAAT 17 91 
C C G G C C C 
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• • • " 

1 ATG AATAATGTATTGAATAGTGGAAGAACAACTATTT 4 0 
GAG C C C CTC T CO 

• • • • 

4 1 GTGATGCGTATAATGTAGTAGCCC ATGATCCATTTAGTTT 8 0 
CCACCCGTC *CC 

8 1 TGAAC ATAAATCATTAGATACCATCCAAAAAGAATGGATG 120 
C C GAGCC C C T T G G G 

• • • • 

121 G AGTGG AAAAGAACAGATCATAGTTT ATATGTAGCTCCTG 160 
A C T T C CTC C C C C A 

• • • ' 

161 T AGTCGGAACTGTGTCTAGTTTTTTGCTAAAGAAAGTGGG 200 
GT A CCCCTC GO 

201 GAGTCTT ATTGGAAAAAGGATATTGAGTGAATTATGGGGG 240 
CTC C C CTC TCC C C T 

241 ATAATATTTCCTAGTGGTAGTAC AAATCTAATGC AAG ATA 280 
C C ATC GTCC T C C 

• • • • . • 

281 TTTTAAGGGAGACAGAACAATTCCTAAATCAAAGACTTAA 320 
CG C GTCCGCTC 

• • • • 

321 TACAGATACCCTTGCTCGTGTAAATGCAGAATTGATAGGG 3 60 
CT TG AACCTG CT 

• • • ' 

361 CTCCAAGCGAATATAAGGGAGTTTAATCAACAAGTAGATA 4 00 
ACTCT CCG GC 

• • • • 

401 ATTTTTTAAACCCTACTCAAAACCCTGTTCCTTTATC AAT 4 40 
CCGTA GT G CTC 

« • • * • 

4 41 AACTTCTTCGGTTAATACAATGCAGCAATTATTTCTAAAT 4 80 
C CGCT CCCCC 

. * • • • 

4 81 AGATT ACCCCAGTTCC AGAT ACAAGGAT ACC AGTTGTTAT 520 
G T T T C ■ C CC 

• • • * 

521 TATTACCTTTATTTGCACAGGCAGCCAATATGCATCTTTC 560 

TC T AC C T T C CT G 

• • • • 

561 TTTTATTAGAGATGTTATTCTTAATGC AGATGAATGGGGT 600 
CCACTCGCCCTC A 
• 

501 ATTTCAGCAGCAACATTACGTACGTATCGAGATTACCTGA 640 
C T C TC TA G A CA C T 

• • ■ ' 

641 GAAATTATACAAGAGATTATTCTAATTATTGTATAAATAC 680 
GCCTCT CCC CCC 
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681 GTATCAAACTGCGTTTAGAGGGTTAAACACCCGTTTACAC 720 
T G C C T AC C T TA GC T 

. * • • 

721 G ATATGTT AGAATTT AGAACATATATGTTTTTAAATGT AT 7 60 
C CTGCGCC CCTCG 

7 61 TTGAATATGTATCCATTTGGTCATTGTTTAAATATCAGAG 800 
G C CAG AGTC C C G C 

, * . • 

801 TCTTATGGTATCTTCTGGCGCTAATTTATATGCTAGCGGT 840 
CTG GC AC CCC CTCT C 

841 AGTGGACCACAGCAGACACAATCATTTACAGCACAAAACT ' 880 

A T GAGC C T G 

881 GGCC ATTTTTATATTCTCTTTTCCAAGTT AATTCGAATTA 920 
C G AGCT G C C C C. 

• 

921 TATATTATCTGGT ATTAGTGGTACTAGGCTTTCTATTACC 960 
C TC CAG CTC G C A C C A 

961 TTCCCTAATATTGGTGGTTTACCGGGTAGT ACTACAACTC 1000 
T C C AC T A CTCC C 

1001 ATTCATTGAATAGTGCCAGGGTTAATTATAGCGGAGGAGT 1040 
AGCC T CTC A G C C T T 

1041 TTCATCTGGTCTC ATAGGGGCGACTAATCTC AATCAC AAC 1080 
CAGC AT G T T A CT G C 

• • 

1081 TTTAATTGCAGCACGGTCCTCCCTCCTTTATCAACACCAT 1120 
C TC C T G A C GAGC G 

1121 TTGTTAGAAGTTGGCTGGATTCAGGTAC AGATCGAGAGGG 1160 
G GTCC T CAGC T C A 

1161 CGTTGCTACCTCTACGAATTGGCAGACAGAATCCTTTCAA 1200- 
A A C A C G C 

1201 ACAACTTTAAGTTTAAGGTGTGGTGCTTTTTCAGCCCGTG 1240 
CCTCCTC A CTA 

1241 GAAATTCAAACTATTTCCCAGATTATTTTATCCGTAATAT 1280 
G CT CCCTAGC 

12 81 TTCTGGGGTTCCTTTAGTTATTAGAAACGAAGATCTAACA 1320 
C T CCCCGT CCC 

1321 AGACCGTTAC ACTATAACC AAATAAG AAATATAGAAAGTC 1360 
CTACTTC GTGCC GTC 

1361 CTTCGGGAACACCTGGTGGAGCACGGGCCTATTTGGTATC 1400 
ACTTAAT AATCCCG 
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1401 TGTGC ATAACAGAAAAJ^T AATATCTATGCCGCTAATGAA 1440 
C GGCC CTCCG 

• • • • 

1441 AATGGTACTATGATCCATTTGGCGCCAGAAGATTATACAG 1480 
CC TCCTA CT 

• • • • 

1481 GATTTACTATATCGCCAAT ACATGCCACTC AAGTGAATAA 1520 
CCCT C TC C 

« . • • • 

1521 TCAAACTCGAACATTTATTTCTGAAAAATTTGGAAATCAA 1560 
GACCCCC GC 

• • • • 

1561 GGTGATTCCTTAAGATTTGAACAAAGCAACACGACAGCTC 1600 
C GGCGTC TCA- 

f • • • 

1601 GTTATACGCTTAGAGGGAATGGAAATAGTTACAATCTTTA 1640 
GCTT6 C CC C 

1641 TTTAAGAGTATCTTCAATAGGAAATTCAACTATTCGAGTT 1680 
C G TAGC CTTCCCCT 

« • • • 

1681 ACTATAAACGGTAG AGTTT ATACTGTTTC AAATGTTAATA 1720 
CC ACT CACT GC 

• • • * 

1721 CCACTACAAATAACGATGGAGTTAATGATAATGGAGCTCG 17 60 
TAGCT C CCC CA 

• ■ • • 

1761 TTTTTCAGATATTAATATCGGTAATATAGTAGCAAGTGAT 1 800 
A CAGC CCCTCCCG CTC C 

• « ■ • 

1801 AATACTAATGTAACGCTAGATATAAATGTGACATTAAACT 1840 
C CTTTGCC CCCT 

1841 CCGGTACTCCATTTGATCTCATGAATATTATGTTTGTGCC 1880 
T A C C 

1881 AACTAATCTTCCACCACTTTAT 1902 
OCT T G C 



FIGURE 13C 



EP 0 385 962 A1 



1 ATGGAGGAAAATAATCAAAATCAATGCATACCTTACAATT 4 0 
G C C C T A C 

• • • • 

4 1 GTTTAAGTAATCCTGAAGAAGTACTTTTGGATGGAGAACG 8 0 
CG CA GTGCT 

• • • • 

8 1 GATATC AACTGGTAATTC ATCAATTGATATTTCTCTGTCA 120 
CT C CTCCCCCT C 

• • • • 

121 CTTGTTCAGTTTCTGGTATCTAACTTTGTACCAGGGGGAG 1 60 
T G C CAGC C G T T 

• • • • 

161 GATTTTTAGTTGGATTAATAGATTTTGTATGGGGAATAGT 200 

GCCTC C TCCC TC 
• , • • • 

201 TGGCCCTTCTCAATGGGATGCATTTCTAGTACAAATTGAA 240 
T A C G G G 

• • • • * 
241 CAATTAATTAATGAAAGAAT AGCTGAATTTGCTAGG AATG 280 

GGCCGGC GCC C 

• • • • 

281 CTGCTATTGCTAATTTAG AAGGATTAGGAAAC AATTTC AA 320 
CC CG GCT C 

• • ■ • 

321 TATATATGTGGAAGCATTTAAAGAATGGGAAGAAGATCCT 360 
CC GCC G GC 

351 AATAATCCAGAAACCAGGACCAGAGTAATTGATCGCTTTC 4 00 
C G CCTGGCCAACA 

• • • • 

401 GTATACTTGATGGGCTACTTGAAAGGGACATTCCTTCGTT 440 
ACTG C C CT GG AT C AC 

• • • * 

441 TCGAATTTCTGGATTTGAAGTACCCCTTTTATCCGTTTAT 480 
CA C CC TTCG GC 

• • • • 

481 GCTCAAGCGGCC AATCTGCATCTAGCTATATTAAGAGATT 520 
AT T C C CC TC CA 

521 CTGTAATTTTTGGAGAAAGATGGGGATTGACAACGATAAA 5 60 
GCC G G CT C 

• • • • 

5 61 TGTCAATGAAAACTATAATAGACTAATTAGGCATATTGAT 60 0 
C GTCC TC C C 

. • • • 

601 GAATATGCTGATCACTGTGCAAATACGTATAATCGGGGAT 640 
GCCC TCCCCTC 

• • • * 

641 TAAATAATTTACCGAAATCTACGTATCAAGATTGGATAAC 680 
GCCCTG T T 

681 ATATAATCGATTACGGAGAGACTTAACATTGACTGTATT A 720 
C C CA G GA G CC C A T G 
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J , . • • 

721 GATATCGCCGCTTTCTTTCCAAACTATGACAATAGGAGAT 7 60 
C T A C G C 

• • • • 

751 ATCCAATTCAGCCAGTTGGTCAACTAACAAGGGAAGTTTA 800 
CTCA G TCA C 

• • • • 

801 TACGGACCCATTAATTAATTTTAATCCACAGTTACAGTCT 840 
T CT CCCT G AAG 

841 GTAGCTCAATTACCTACTTTTAACGTTATGGAGAGCAGCC 880 
CCCTCAC C TC 

• • • • 

881 GAATTAGAAATCCTCATTtATTTGATATATTGAATAATCT 920 
TCGCACG CC CC 

' • • • • 

921 TACAATCTTTACGGATTGGTTTAGTGTTGGACGCAATTTT 9 60 
T CC. CC GTCC 

• • • • 

961 TATTGGGGAGGACATCGAGTAATATCTAGCCTTATAGGAG 1000 
T CA G C C CTCT * T 

< » • • 

1001 GTGGT AAC ATAAC ATCTCCT ATAT ATGGAAGAG AGGCGAA 1040 
G T C C C T A 

t • • • . 

1041 CCAGGAGCCTCCAAGATCCTTTACTTTTAATGGACCGGTA 1080 
A C TAGT G C C C T A C 

• • • ♦ 

1081 TTTAGGACTTT ATCAAATCCT ACTTTACGATTATTAC AGC 1120 
C A C G T C C GA GC C • 

• • » • 

1121 AACCTTGGCCAGCGCCACCATTTAATTTACGTGGTGTTGA 1160 

T T C CC TA A 

• • • • 

1161 ' AGGAGTAGAATTTTCTACACCTACAAATAGCTTTACGTAT 1200 
G C T G C T C CTC C T C 

• • • • 

1201 CGAGGAAG AGGT ACGGTTGATTCTTTAACTG AATTACCGC 1240. 
A T AC CGCCCA 

• • • • 

1241 CTGAGGATAATAGTGTGCCACCTCGCG AAGG ATATAGTCA 1280 
A C C CA G C CTCC 

• • • • 

1281 TCGTTT ATGTCATGC AACTTTTGTTC AAAG ATCTGG AACA 1320 
CAGGCC CCGGCTC T 

• • • • 

1321 CCTTTTTTAACAACTGGTGTAGTATTTTCTTGGACCGATC 1360 
ACCCTAATGCA T 

• • • • 

1361 GT AGTGCAACTCTT AC AAATAC AATTGATC C AG AGAG AAT 1400 
T C T C C G 
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1401 TAATCAAATACCTTTAGTGAAAGGATTTAG AGTTTGGGGG 1440 
C CAGCGTCCTG A 

• • • • 

1441 GGCACCTCTGTCATTACAGGACCAGGATTTACAGGAGGGG 1480 
AT C C C . T 

• • « • 

1481 ATATCCTTCGAAGAAATACCTTTGGTGATTTTGTATCTCT 1520 
T A C T C C GAGC 

1521 ACAAGTCAATATTAATTCACCAATTACCCAAAGATACCGT 1560 
C TCCCT' T T 

1561 TTAAGATTTCGTTACGCTTCCAGTAGGGATGCACGAGTTA 1600 
C C G A TTCCC T C TA C 

1601 TAGTATTAACAGGAGCGGCATCCACAGGAGTGGGAGGCCA 1640 
CGCCCCATTCTCTA 

1641 AGTTAGTGTAAATATGCCTCTTCAGAAAACTATGGAAATA 1680 
CTCC G C AC G G C 

1681 GGGGAGAACTTAACATCTAGAACATTTAGATATACCGATT 1720 
C G CGCC C C 

1721 TTAGTAATCCTTTTTCATTTAGAGCTAATCCAGATATAAT 1760 
CTC C CAGT CC T C C T C C 

• • • t 

17 61 TGGGATAAGTGAACAACCTCTATTTGGTGCAGGTTCTATT 1800 
CTC C AT AGC C 

1801 AGTAGCGGTG AACTTTAT ATAG AT AAAATTGAAATT ATTC 1840 
TCATCT C TGCTCG GC 

1841 TAGCAG ATGCAACATTTG AAGCAGAATCTGATTT AGAAAG 1880 
TCCTCCCGTG ACA CC T G 

1881 AGCACAAAAGGCGGTGAATGCCCTGTTTACTTCTTCCAAT 1920 
C G T C C C CA 

1 921 CAAATCGGGTTAAAAACCGATGTGACGGATTATCATATTG 1960 
GCTCG TACTTC C 

1961 ATCAAGTATCC AATTTAGTGGATTGTTTATC AGATGAATT 2000 
C G C G CACC ACC TAGC G 

2001 TTGTCTGGATGAAAAGCG AGAATTGTCCGAGAAAGTCAAA 2040 
CCCCG T CC T 

2041 C ATGCGAAGCG ACTC AGTG ATGAGCGGAATTTACTTCAAG 2080 
CC T CCA CCTG 

2081 ATCCAAACTTCAGAGGGATCAATAGACAACCAGACCGTGG 2120 
CT C A AC C G G A 
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2121 CTGGAGAGGAAGTACAGATATTACCATCCAAGGAGGAGAT 2160 
TGT CCGGC CC 

• * • • 

2161 GACGTATTC AAAGAGAATT ACGTC ACACT ACCGGGT ACCG 2200 
TG G C CCTCATT 

2201 TTGATGAGTGCTATCCAACGTATTTATATCAGAAAATAGA 2240 
CC CTCCGC .GO 

• * « ■ ■ 

2241 TGAGTCGAAATTAAAAGCTTATACCC6TTATGAATTAAGA 2280 
C CC CTC AG CCT 

• • • • 

2281 GGGTATATCGAAGATAGTCAAGACTTAGAAATCTATTTGA 2320 
CC CC CT CC 

• • • • 

2321 TCCGTTACAATGCAAAAC ACGAAATAGTAAATGTGCC AGG 2360 
AG CG GCCG C 

2361 CACGGGTTCCTTATGGCCGCTTTC AGCCCAAATGCC AATC 2400 
T T C C A T TCT C T 

2401 GGAAAGTGTGGAGAACCGAATCGATGCGCGCCAC ACCTTG 2440 
G G T CA T 

2441 AATGGAATCCTGATCTAGATTGTTCCTGCAGAGACGGGGA 2480 
G CTGCC -GTC 

• * • • 

2481 AAAATGTGCACATCATTCCCATCATTTCACCTTGGATATT 2520 
GG CC T CT .CC 

2521 GATGTTGGATGTACAGACTTAAATGAGGACTTAGGTGTAT 2560 
G TCG CCAC 

2561 GGGTGATATTCAAGATTAAGACGCAAGATGGCCATGCAAG 2 600 
C C C C C A C 

2 601 ACTAGGGAATCT AGAGTTTCTCGAAGAGAAACC ATT ATTA 2640 
T C C T GG C 

2 641 GGGGAAGCACTAGCTCGTGTGAAAAGAGCGGAGAAGAAGT 2680 
T T C G A 

• • • • 

2681 G6A6AGACAAACGAGAGAAACTGC AGTTGG AAAC AAATAT 2720 
G T CG A G T C 

• • • • 

2721 TGTTTATAAAGAGGCAAAAGAATCTGT AG ATGCTTT ATTT 2760 
C CG C GCG GC 

27 61 GTAAACTCTCAATATGATAGATTACAAGTGGATACGAACA 2800 
G C CAG G CC C C 

2801 TCGCCATGATTC ATGCGGCAGATAAACGCGTTCATAGAAT 2840 
CCC C TGCC 
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2841 CCGGGAAGCGTATCTGCCAGAGTTGTCTGTGATTCC AGGT 2880 
TTGTCT T C CT 

• • 

2881 GTCAATGCGGCCATTTTCGAAGAATTAGAGGGACGTATTT 2920 
GOT C GCT C 

• • • • 

2921 TTACAGCGTATTCCTTATATGATGCGAGAAATGTCATTAA 2960 
CATC GC C C C 

* • • • 

2961 AAATGGCGATTTCAATAATGGCTTATTATGCTGGAACGTG 3000 
G C T C C C CAGC T 

• • 

3T)01 AAAGGTCATGTAGATGTAGAAGAGCAAAACAACCACCGTT 3040 

GCG6AG TG 
• • 

3041 CGGTCCTTGTTATCCCAGAATGGGAGGCAG AAGTGTCACA 3080 
C GGGTG AT C 

• • 

3081 AGAGGTTCGTGT CTGTCCAGGTCGTGGCTATATCCTTCGT 3120 
» A A A C T C 

• • 

3121 GTCACAGCATATAAAGAGGGATATGGAGAGGGCTGCGTAA 3160 
GCTCG CT T G 

3161 CGATCCATGAGATCGAAGACAATACAGACGAACTGAAATT 3200 
C C GACC GTG 

3201 CAGCAACTGTGTAGAAGAGGAAGTATATCCAAACAACACA 3240 
TC CC.GAAC C C 

• 

3241 GTAACGTGTAAT AATTAT ACTGGG ACTC AAGAAGAATATG 3280 
TTCCGCC TA G GC 

3281 AGGGT ACGT ACACTTCTC GTAAT.C AAGG AT ATG ACG AAGC 3320 
GA G C AGC GAG T CA 

• • • * 

3321 CTATGGTAATAACCCTTCCGTACCAGCTGATTACGCTTCA 3360. 

TCC TCXXXXXXXXXXXX T T C T C C 

* • • " * 

3361 GTCTATGAAG AAAAATCGTAT AC AGATGGACGAAG AGAG A 3400 
GCGG CC CACT 

• • ' • 

3401 ATCCTTGTGAATCTAACAGAGGCT ATGGGG ATTAC ACACC 3440 
C C G TC T CA C 

• • • * 

3441 ACTACCGGCTGGTTATGT AACAAAGGATTT AG AGTACTTC 3480 

TATC TC GCT T 

3481 CCAGAGACCGATAAGGTATGGATTGAGATCGGAGAAACAG 3520 
T CAGC T C 

3521 AAGGAACATTCATCGTGGATAGCGTGGAATTACTCCTTi^T 3550 
G C C GC T T G 
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1 AGATCTAGAGGTAATTGTTATGAGTACTGTCGTGGTTAAG 40 

GATC 

41 GGAAACGTCAACGGTGGTGTACAACAACCTAGAAGGAGGA 80 
G T A 

81 GAAGGCAATCCCTTCGCAGGAGGGCTAACAGAGTACAGCC 120 

T A T 

. • • • 

121 AGTGGTTATGGTC ACTGCTCCTGGCGAACCC AGGAGGAGG 160 

GC A A A 

• • • • 

161 AGACGCAGAAGAGGAGGCAATCGCAGGTCAAGAAGAACTG 200 
AG T A 

• • • • 

201 GAGTTCCCAGGGGAAGGGGCTCAAGCGAGACATTCGTGTT 240 
A A T 

• . • • 

241 TACAAAGGACAACCTCGTGGGCAACTCCCAAGGAAGTTTC 280 



281 ACCTTCGGACCAAGTGTATC AGACTGTCCAGC ATTCAAGG 320 

T 

. • • • 

321 ATGGAATACTCAAGGCCTACCATGAGTACAAGATCACAAG 360 

T 

• , • • ' 

361 TATCCTTCTTCAGTTCGTCAGCGAGGCCTCTTCCACCTCA 400 
T G T 

• • • 

401 CCAGGATCCATCGCTTATG AGTTGG ACCCAC ATTGCAAAG 440 
C AT 

. . • • 

4 41 TATCATCCCTCCAGTCCTACGTCAACAAGTTCCAAATCAC 480 

T 

. • • • 

481 AAAGGGAGGAGCTAAGACCTATCAAGCTAGGATGATCAAC 520 
T T C T 

• • • • 

521 G6AGTAGAATGGC ACGATTCATCTGAGGATCAGTGCAGGA 5 60 
T T A 

• • • • 

5 61 TACTTTGGAAAGGAAGTGGAAAATCTTCAGACCCAGCAGG 600 

C A G T T 

• • • • 

601 ATCTTTCAGAGTCACCATCAGAGTGGCTCTTCAAAACCCC 64 0 

T T A 

• • • • 

641 AAGTAATAGACTCCGGATCAGAGCCTGGTCCAAGCCCACA 680 

A T 
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• • • • 

681 ACCAACACCCACTCCAACTCCCCAAAAGCATGAGCGATTT 720 

• » • • 

721 ATTGCTTACGTCGGCATACCTATGCTGACCAiTCAAGAAT 760 
761 TC 762 
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